An IO device (mmap direct I/O and asynchronous I/o)<br>

Nowadays, it is often seen in Linux that drivers write in user space, x servers, for example, some vendor private drivers, etc. This means that user space implements hardware access, which is usually done via mmap memory mapping to user process equipment into space. So that the user can read and write memory access hardware.

The kernel usually buffers I/O operations for better performance, but it also provides a direct I/O and asynchronous I/O functionality.

When interacting with hardware, some hardware support dma, dma can reduce the burden on the processor, some hardware can not directly read and write memory space, you need to use special instructions.

When using mmap in the virtual memory area, you need the address block kernel to map to a user's address space. This involves a critical vma data structure. A vma represents the same type of area in the process's virtual address space, with the same permissions and The same object (a file or swap space) continuously backs up the virtual address range.

View the process area of ​​memory by looking at /proc/$/map. The format is as follows:

Start-end hot offset main: image of small inode

This is a map of the init process:

R-00000000-00000000p0000000001/sbin/init. Sysvinit

Rw, 00000000-00000000-p0000000001/sbin/init. Sysvinit

00000000-00000000-p0000000000

Rw,00000000-00000000-p0000000000 (heap)

Rw-p0000000000.00000000-00000000

Rw,00000000-00000000-p0000000000 (stack)

The meaning of each part is as follows:



Begins to end: The movie begins and ends with a virtual address in memory.

Hot: Memory read, write and execute permission bitmask, hot hair, last character p, private, or shared.

Offset: Memory-mapped file offsets (memory area mapped to file)

Primary Primary: Primary and secondary schools mapping the number of file devices. The mapped device, major and minor device numbers refer to the disk's disk file representing the device's primary and secondary devices, rather than the number of cores assigned to the real major device and auxiliary device.

Inode: mapping file inode number.

Image: The corresponding data structure of the mapping file vmavm_area_struct, including the following function pointers:

1.1 Open its prototype is: invalid (* open) (structvm_area_struct * vma)

The kernel will generate a new reference vma, call it, and implement the vma kernel component with a chance to do its own initialization. Vma won't call it, but it will create new kernel components that will be called by the mmap function. 1.2 ends

Its prototype is: Invalid (*closed) (structvm_area_struct*vma) When the call memory area is destroyed, no reference count is affected, so that a process can open and close the affected area.
1.3nopage

Its prototype is: structure page * (*nopage) (structvm_area_struct*vma, unsigned long address, int* type). When a process tries to access a legal vma page, but the page is not currently in memory, the kernel will call it For vmanopage function. This function returns a pointer to the physical page page. If vma does not define its own nopage interface, the kernel will allocate a white page.
Second, mmapmmap allows the device to map to user-space memory so that the user program can access the hardware. The mmap action needs to be driven in the kernel. After the use of mmap mapping, the user program reads/writes memory devices in a given range of memory, in terms of reading and writing, and is still accessing the device.
Not all hardware supports mmap, such as mmap serial devices do not support. Mmap has a limited, particle-sized page size map, so the kernel manages virtual memory addresses at the page table level, so user processes that use Mmap device memory to map to virtual memory space must start with page units, and the kernel must also map physical addresses. You must start an integer multiple of the page size. The physical address of the map page size of the start address must be the same.
Most pci peripheral control registers are mapped to memory addresses. Such a device only needs to control the hardware functions through memory mapping to user space. This is very attractive compared to traditional ioctl methods.
Mmap is part of the file_operations structure. Because in *nix, all files and kernel components easily implement their own mmap using this structure.
System call user space program:
Mmap (caddr_taddr, size_tlen, int Prote int, intfd, off_t offset)
Call the mmap function in fd. When using the mmap system call, the kernel will make some preparations before calling fdmmap. The prototype of Fd's mmap function in the kernel is as follows:
Int(*mmap)(struct file*filp,structvm_area_struct*vma);
Filp mapping file, vma contains the virtual address information of the access device. Fdmmap needs to do the vma include virtual address range to establish the appropriate page table, and initialize the vma function pointer so that the appropriate function can be used subsequently.
2.1 Setting up a Page Table Setting up a page table mmap needs to complete one of the most important tasks. There are two ways to create a page table:

Once the call to remap_pfn_range is complete


By creating the 2.1.1 nopage function is a page using the remap_pfn_rangeremap_pfn_range and io_remap_page_range??? to create a new page table physical address, their prototype is as follows:

Intremap_pfn_range(structvm_area_struct*vma, unsigned long addr,

Unsigned long PFN, unsigned long dimension, pgprot_tprot);

It will start with the influence of addr from the location to which the PFN length dimension (size will be aligned to the page size) maps. Because of the size of the variable, it can be used to map the entire region, and it can also be used to map just part of it.



Impact: Physical Address Mapping to User Vma

Addr: The physical address is mapped to the user vma address space in the user space start address (usually vam - and gt; start), but it can not start vam - and gt;.

PFN: Kernel Physical Address Mapping

Size: Size area

Anti-: protected mode of the map page in the intioremap_page_range (unsigned long addr, unsigned long end,

Phys_addr_tphys_addrpgprot_tprot);

It will map the I/O memory starting at the phys_addr size (end-addr+1, aligned page size) to the beginning of addr at the virtual address.



Addr: virtual address start value

DC

End: End virtual address value

Phys_addr: physical address start value

Anti-: The difference between the protected mode is that when you want to map to the user-space address is true memory, use remap_pfn_range if the address you want to map to user-space is I/oioremap_page_range memory. It should be noted that if it is a memory, I/O is usually not cached by the kernel.

2.1.2 Mapping memory with nopage Once the entire page table has been built it is a good choice in most cases, but in some cases nopage is more appropriate. Because it is more flexible. Two typical scenarios using nopage are as follows:



The application calls the mremap system call mapping area changes. When the impact area changes hourly on the phone, the kernel will not notify the driver but will refresh unnecessary pages, but when the impact of this call becomes larger, the kernel nopage method is called to apply the new page. So in this sense, if you want to support system calls, you must be aware of the nopage method.

When the user accesses the vmas system in the page, but the page is not in memory, the nopage function will be called. The Nopage function should return a pointer to this page and increase its reference count to display the pages people use.

Nopage parameter type (if not empty, then it can be used to return a return error The return value is different, typically vm_fault_minor. Since nopage needs to return to the page's memory pointer, but the pci storage space does not have a page pointer, the nopage method does not apply to pci Address space.

When the call is successful nopage returns a pointer to the structure pointer page. Otherwise nopage returns an error. If the nopage function is empty, the error handling kernel code maps zero memory pages to virtual addresses. Zero memory page is a special page, read it returns 0, write it will modify the private copy process.

2.2 Adding vma operations The other important operation of mmap is to update the affected function pointers. That is nopage, open and close function pointers.

2.3 Page Remap Memory remap_pfn_range can only be used to reserve the physical address of the above physical memory at the top, in fact not memory management system management memory. This is where conventional memory cannot be mapped using it, including using __get_free_page free memory. So, if you want to use it to map a block of memory, this part of the memory that the system starts to dial out (because the remap_pfn_range mapping process can start read and write directly, and the memory managed by the kernel memory management system can be assigned to do other Things have a potential conflict.)

Although you can't use remap_pfn_range memory-to-userspace mapping, there are alternative ways to map memory to user address space using the vmanopage method, which is a page of user-space memory maps. If you want a kernel component memory address to be mapped to the user address space, you will implement the nopage function interface and return to the page's function page.


It is important to note that when using the nopage function to return pages, the real page is needed, so you need to find the actual page pointer. Regular kernel memory may go through the virt_to_page page, but for the vmalloc return address, it is the vmalloc_to_page page. Third, most of the direct I/O I/O operations pass through the kernel buffer. This is to improve the I/O efficiency, but some buffers in the field may not be able to obtain good performance. So the kernel also provides APIs that do not want to use buffers. If the peripheral driver does not want to use the kernel buffer mechanism, the following API can be used: long get_user_pages(structtask_struct*啧啧, structmm_struct*mm,
Unsigned long start, unsigned long nr_pages, int write,
Int, struct page ** page, structvm_area_struct**vmas system)
This function maps the page user process to the kernel address space, and then can directly access the page in the kernel code. The meaning of the parameter:
啧啧: Pointer to the I/O task, the role of who is responsible for notification of the kernel error, if you do not need the record can be set to null


Mm: pointer memory management structure, describing the mapped address space

Start: User address space begins

Cima Motor

Nr_pages page

Write: whether the caller goes to this page to write the data caller that the page will be written to

Force: If you set it up, even if you use a read-only user-mapping process domain, it will be forced to write, usually this is not the expected effect

Page: The pointer array page should be at least nr_pages array size, or if you don't want to get this information set to null.

Vmas system: refers to the array of pins corresponding to each page vma. If the caller does not want this information, it can be empty.

Because this function needs to build a page table mapping, it is more time-consuming. Direct I/O ignores kernel buffers. In the absence of kernel buffers, direct I/O usage tends to use asynchronous I/O while IO The user directly knows when its operation is complete, and when it can reuse it to submit information, such as caching data, the kernel must wait for the IO to complete, which is obviously not the user's expectation in most cases (because IO itself is a time-consuming, will What's wasted is waiting for IOCPU time). In fact for block device drivers and network drivers, the relevant framework code has been used directly at the right time with direct I/O, and the basic driver authors do not need to test direct I/O, and character drivers, apparently directly I/O and not Attractive (character stream is not on the page). Special emphasis must be placed on the function mmap_sem. After the direct I/O operation is completed, these pages must be released. If these pages change, you must call the setpagedirty tag page dirty, otherwise the kernel will assume that there is no content page changed so that it cannot be synchronized to the corresponding device or file by its contents. This is usually wrong. Release page page_cache_release completed function.

IV. Asynchronous I/O (aio) In addition to direct I/O, the kernel provides another I/O function, asynchronous I/O. Asynchronous I/O allows a user program to initiate one or more I/O operations without waiting for the completion of the operation. The kernel provides a set of apis to support the user program to start aio. The 4.1 kernel user interface provides the user with space APIs and interfaces as follows:

Io_setup: Creates an asynchronous I/O for the current process context. It has a parameter that specifies the maximum number of asynchronous I/Os the context can commit.

Io_submit: Submit one or more asynchronous I/O requests

Io_getevents: Access has committed an asynchronous I/o request

Io_cancel: cancel asynchronous I/o request submission

Io_destroy: Cleaning up this process creates an asynchronous I/O context in aio these interface definitions. H and aio. C, is a system call. With the meaning of these APIs, it is clear that applications that you need to use asynchronous I/O need to first create an asynchronous I/O context and then submit the context of an asynchronous I/O request. A process can create multiple asynchronous I/O environments. Contexts will be stored in task_struct - and gt; Mm - and gt; Ioctx_list. The user process can then submit the /IO request commit asynchronously in its context. If you want to get the status of asynchronous I/O can be obtained using io_getevents, the process can also choose to use io_cancel to cancel asynchronous I/O commits. After use, you can use the io_destroy asynchronous I/o clear context. The 4.2 kernel implements the asynchronous I/O2 kernel context so that the kioctx stands for asynchronous I/O. In this context, user-created asynchronous I/O context information is stored here. After successfully creating an asynchronous I/O context, the kernel returns an id. Is a user process, then the user process can use the context id. When creating an asynchronous I/O context, the kernel creates an aio ring. The Aio ring corresponds to the memory buffer in the user process address space and is accessed by the kernel during user mode entry. The kernel is called the get_user_pages user page. The Aio loop is a circular buffer. The kernel uses asynchronous I/O completion reports. It can also check the asynchronous I/O completion directly in the user mode process to avoid the overhead of system calls. 4.2.2 ASYNCHRONOUS I/O REQUEST The kernel uses kiocb to represent asynchronous I/O requests, and the driver no longer has an iocb data structure to represent user process asynchronous I/O requests and the kernel will complete the conversion between them. When io_submit, the user can submit multiple asynchronous I/O requests. According to the requested model, in turn, the kernel will process each asynchronous I/O request (group), where you will call the file pointer file_operations for asynchronous I/O operations. Function, this function returns the value of the non-eiocbqueued, aio frame and calls aio_complete directly and returns otherwise it is a real asynchronous I/O, and the return of the file_operationseiocbqueued completion part will be responsible for handling the aio_complete asynchronous I/O after the I/O request completes . You can see from the kernel implementation to support asynchronous I/O components that it needs to do is the correct file_operations implementation of the asynchronous I/O interface (asynchronous I/O that can be achieved through mechanisms such as workqueues) and is done After the asynchronous I/oaio_complete call. Asynchronous I/O State Hold Set When the user process asynchronous I/O state is collected through the system call the kernel will respond to the request via read_events, the kernel will wait in the wait queue, wait for aio_complete to wake up or be interrupted or timed out.

4.2.4 To cancel asynchronous I/O requests, if you want to support canceling asynchronous I/O requests, the implementer needs to call the I/O operation kiocb_set_cancel_fn to set its cancellation function so that when the user initiates an I/O operation to cancel the request, aio The framework calls the cancel function to unassign asynchronous I/O requests. 4.2.5 Asynchronous I/O AMC is to clear the asynchronous I/O context, the aio framework will be waiting for the wakeup process to wake up in the context of kill_ioctx, and then release the relevant data structures.
Copyright Statement: The original blog post in this article cannot be reproduced without the consent of the blogger.

Posix introduces asynchronous I/o

Linear guide

Linear Guide,Linear Rails,Linear Guide Rail,Heat Resistant Linear Guide Steel

Huaibei Zhonglian Steel Technology Co., Ltd. , https://www.zlxgsteel.com

This entry was posted in on