Add Book to My BookshelfPurchase This Book Online

Chapter 6 - Special-Purpose File Operations

UNIX Systems Programming for SVR4
David A. Curry
 Copyright © 1996 O'Reilly & Associates, Inc.

Memory-Mapped Files
The concept of memory-mapped files was first introduced in UNIX by Berkeley in 4.2BSD (although Berkeley did not actually implement the concept until 4.4BSD). It has since been adopted by most vendor versions of the operating system, including SVR4. A memory-mapped file is basically what its name implies: a file (or portion of a file) that has been mapped into a process' address space.
Once a file is mapped into memory, a process can access the contents of that file using address space manipulations (that is, variables, pointers, array subscripts, and so on) instead of the read/write interface. The operating system takes care of transferring the file into memory (and, if the memory is modified, transferring it back to the file) through the virtual memory subsystem. In other words, as the process accesses the file, the operating system pages the file into and out of memory. This is usually (but not always) more efficient than reading the entire file into memory directly, especially when only small portions of the file's contents are actually used.
One of the most important uses for memory-mapped files is in the implementation of dynamically loadable shared libraries. In the old days, when a program was linked, all the executable code for the library routines it called (the code for the routines described in this book) was copied into the executable file. This consumed a lot of disk space and also took up a lot of memory, because there might be multiple copies of a routine (for example, printf) in memory at any given time. The introduction of dynamically loaded, shared libraries has solved both of these problems. Because the library is dynamically loaded, it does not have to be compiled into each program. Rather, when the program is executed, the system loads the library into memory and allows the program to transfer control to this area of memory. This conserves disk space by having only one copy of each library routine on the disk. Because the library is shared, each program that uses the library is using the same copy. Thus, there is only one copy of printf in memory at a time, and all programs that need it use the same copy.
Dynamically loadable shared libraries are implemented with memory-mapped files. When a program is linked, a jump table is created that contains an entry for each library routine. When the program is executed, the operating system maps the library into memory and then edits the jump table to fill in the address of each function. As the program calls library functions, the operating system pages those parts of the library into memory and lets the program use them. If part of the library is never used (for example, the part taken up by some obscure function), it is never loaded into memory.
Memory-mapped files are useful for other purposes, too. For example, a program that retrieves data from a very large database might use some type of index into the database. It searches for an item in the index, and when it finds the item, uses information stored in the index entry to retrieve the data. Indexes for large databases are usually very large themselves. If the program must retrieve only one or two items from the database, it is unlikely that it needs to examine each and every entry in the index (depending on its search algorithm). Thus, it would be a waste of both time and memory to read the entire index into memory. Instead, the program can map the index into memory, access it as if it were an array (or whatever), and the operating system only brings in those parts of the index the program actually needs. This both makes the program run faster and places less load on the system.
Mapping a File into Memory
A file is mapped into memory with the mmap function:
    #include <sys/types.h>
    #include <sys/mman.h>
    caddr_t mmap(caddr_t addr, size_t len, int prot, int flags,
            int fd, off_t offset);
This function maps len bytes of the file referenced by fd, beginning at offset, into the process' address space. It returns a memory address that points to the start of the mapped segment on success, or (caddr_t) -1 on failure. If the call fails, errno contains the reason for failure.
The mapped segment can extend past the end of the file, but any reference to addresses beyond the current end of the file will result in the delivery of a SIGBUS signal (see Chapter 10, Signals). This means that mmap cannot be used to implicitly extend the length of a file.
 NoteMappings established for fd are not removed when the file descriptor is closed. The munmap function (see below) must be called to remove a mapping.
The prot parameter specifies the ways in which the mapped pages are accessed. These values are or ed together to produce the desired result:
PROT_READ
The page can be read (that is, the contents of the page can be examined).
PROT_WRITE
The page can be written (that is, the contents of the page can be changed).
PROT_EXEC
The page can be executed (that is, the contents of the page can be executed as program code).
PROT_NONE
The page cannot be accessed.
Most implementations of mmap do not actually support all combinations of the above values. They usually map some of the simpler modes into more complex ones (for example, the PROT_WRITE mode is usually implemented as PROT_READ|PROT_WRITE). However, no implementation allows a page to be written unless PROT_WRITE is specified.
The flags parameter provides additional information about how the mapped pages should be treated:
MAP_SHARED
When changes are made to the mapped object, these changes are shared among other processes that also have the object mapped.
MAP_PRIVATE
When changes are made to the mapped object, these changes cause the system to create a private copy of the affected pages, making the changes in the copy. Other processes that have the object mapped cannot see the changes.
MAP_FIXED
Inform the system to map the file into memory exactly at address addr (see below). The use of this flag is discouraged because it can prevent the system from making the most efficient use of system resources.
MAP_NORESERVE
Normally, when MAP_PRIVATE mappings are created, the system reserves swap space equivalent to the size of the mapping. This space is used to store the private copies of any modified pages. When this flag is specified, the system does not preallocate space for the modified pages. This means that if swap space for a newly modified page is unavailable, the process receives a SIGBUS signal when it tries to modify that page.
This flag is not available in HP-UX 10.x.
The addr parameter specifies the suggested address at which the object is to be mapped. If addr is given as zero, the system is granted complete freedom to map the object wherever it wants for best efficiency. If addr is non-zero but MAP_FIXED is not specified, it is taken as a suggestion of an address near where the memory should be mapped. If addr is non-zero and MAP_FIXED is specified, it is taken as the exact address at which to map the object.
Removing a Mapping
A memory mapping is removed with the munmap function:
    #include <sys/types.h>
    #include <sys/mman.h>
    int munmap(caddr_t addr, size_t len);
The mapping for the pages in the range addr to addr+len are removed. Further references to these pages result in the delivery of a SIGSEGV signal to the process (see Chapter 10, Signals). If the unmapping is successful, munmap returns 0; otherwise, it returns -1 and places the reason for failure in the external variable errno.
Example 6-3 shows a program that uses mmap to read files and print them on the standard output (much like the cat command).
Example 6-3:  catmap
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <stdlib.h>
#include <fcntl.h>
#include <stdio.h>
#define MAP_FAILED      ((void *) -1)
int
main(int argc, char **argv)
{
    int fd;
    struct stat st;
    caddr_t base, ptr;
    /*
     * For each file specified...
     */
    while (--argc) {
        /*
         * Open the file.
         */
        if ((fd = open(*++argv, O_RDONLY, 0)) < 0) {
            perror(*argv);
            continue;
        }
        /*
         * Find out how big the file is.
         */
        fstat(fd, &st);
        /*
         * Map the entire file into memory.
         */
        base = mmap(0, st.st_size, PROT_READ, MAP_SHARED, fd, 0);
        if (base == MAP_FAILED) {
            perror(*argv);
            close(fd);
            continue;
        }
        /*
         * We can close the file now; we can access it
         * through memory.
         */
        close(fd);
        /*
         * Now print the file.
         */
        for (ptr = base; ptr < &base[st.st_size]; ptr++)
            putchar(*ptr);
        /*
         * Now unmap the file.
         */
        munmap(base, st.st_size);
    }
    exit(0);
}
    % catmap /etc/motd
Sun Microsystems Inc.   SunOS 5.3       Generic September 1993
Changing the Protection Mode of Mapped Segments
The mprotect function allows a process to change the protection modes of a previously mapped segment:
    #include <sys/types.h>
    #include <sys/mman.h>
    int mprotect(caddr_t addr, size_t len, int prot);
The addr and len parameters specify the starting address and length of the segment whose permissions are to be changed. The prot parameter specifies the new protection mode to be set on the segment using the PROT_READ, PROT_WRITE, PROT_EXEC, and PROT_NONE flags as described earlier. Upon successful completion, mprotect returns 0; otherwise, it returns -1 and stores the reason for failure in errno.
Providing Advice to the System
Once a file is mapped into memory, the operating system's virtual memory subsystem is responsible for paging that file into memory. In order to make the mapping more efficient and consume fewer system resources, the madvise function allows a process to give “hints” to the system about how best to page the object into memory:
    #include <sys/types.h>
    #include <sys/mman.h>
    int madvise(caddr_t addr, size_t len, int advice);
The addr and len parameters specify the starting address and length of the segment to which the advice applies. The advice parameter can contain one of the following:
MADV_NORMAL
This is the default mode. The kernel reads all the data from the object (or at least reads a reasonable amount) into pages that are used as a cache. System pages are a limited resource, and the kernel steals pages from other mappings if necessary. This can adversely affect system performance when large amounts of memory are accessed, but in general it is not a problem.
MADV_RANDOM
The process jumps around in the object, and may access a tiny bit here and then a tiny bit there. This tells the kernel to read in a minimum amount of data from the mapped object on any particular access, rather than reading larger amounts in anticipation of other accesses within the same locality.
MADV_SEQUENTIAL
The program is planning to access the object in order from lowest address to highest, and each address is likely to be accessed only once. The kernel frees the resources from the mapping as quickly as possible. (The catmap program could use this option to increase performance.)
MADV_WILLNEED
Tells the system that a specific address range is definitely needed, so that it can start reading the specified range into memory. This can benefit programs that need to minimize the time needed to access memory the first time.
MADV_DONTNEED
Tells the kernel that a specific address range is no longer needed, so that it can begin freeing the resources associated with that part of the mapping.
With the exception of MADV_DONTNEED, these constants are not supported in IRIX 5.x.
Synchronizing Memory with Physical Storage
When an object is mapped, the system maintains both an image of the object in memory and a copy of the image in backing storage. The backing storage copy is maintained so that the system can allow other processes to use the physical memory when it is their turn to run. The backing storage for a MAP_SHARED mapping is the file to which the mapping is attached; the backing storage for a MAP_PRIVATE mapping is its swap area. The msync function tells the system to synchronize the in-memory copy of the mapping with its backing storage (the system does this periodically on its own, but some programs need to have the object in a known state):
    #include <sys/types.h>
    #include <sys/mman.h>
    int msync(caddr_t addr, size_t len, int flags);
The addr and len parameters specify the starting address and length of the segment to synchronize. The flags parameter consists of one or more of the following values or ed together:
MS_ASYNC
This causes all writes to be scheduled, after which msync returns. The writes are completed a short time afterward.
MS_SYNC
All write operations are performed before msync returns. This guarantees that the data is on the disk before the process proceeds, but it also causes the process to wait for a longer period of time.
MS_INVALIDATE
Invalidates any cached copies of the segment in memory, so that any subsequent references to the pages cause the system to bring them in from their backing storage locations.
If msync succeeds, it returns 0. Otherwise, it returns -1 and places the error indication in errno.

Previous SectionNext Section
Books24x7.com, Inc © 2000 –  Feedback