Reading and writing files using memory-mapped I/O

Table of Contents

Understanding memory-mapped IO When to prefer MMIO over file system API When to prefer the plain old file system API MMIO in action Conclusion

Home/

Blog/

Programming/

Reading and writing files using memory-mapped I/O

7 mins read

Using a file system API to write and read files is not the only way to access files in Linux. There is another way called the memory-mapped IO. Knowing an alternate way to access files can be intriguing if someone is not already aware of it. This alternate file access method is also a common technical phone screening question. In this blog, we will learn how memory-mapped IO works to access files.

See Educative’s interview preparation guide. This guide provides a step-by-step plan that spans over 12 weeks.

addr: This argument indicates the calling process’s virtual address where the mapping will be located. If we pass NULL, the kernel will choose a suitable, free location for us.
length: This argument specifies the length (in bytes) of the mapping. It determines how much data from the file will be mapped into memory.
prot: This argument sets the memory protection for the mapped region and can be a combination of the following flags:
- PROT_READ: Pages in the mapping can be read.
- PROT_WRITE: Pages in the mapping can be written.
- PROT_EXEC: Pages in the mapping can be executed as code.
- PROT_NONE: Pages in the mapping are inaccessible.
flags: This argument specifies additional options for the mapping:
- MAP_SHARED: Multiple processes can share this mapping. Changes made by one process are visible to others.
- MAP_PRIVATE: The mapping is private to the calling process. Changes to the mapped region are not visible to other processes and vice versa. A private copy of the page is created if a process modifies the memory. In the case of file mapping, the changes will not be written to the underlying file. There are many use cases for mapping a file as private.

fd: This argument is the file descriptor of the file we want to map into memory.
offset: This argument is the offset within the file (specified by fd) where the mapping should start. For regular files, this is typically 0.

The mmap system call returns a pointer to the mapped memory on success or MAP_FAILED on failure.

When to prefer MMIO over file system API#

The file access using MMIO often can simplify program logic compared to explicitly using read() and write() functions. An example is an application that dynamically gets clients’ requests to access different parts of a large file. We will need explicit seeks to move the file pointer before accessing the file region. Using MMIO, we can map portions of the file in the memory and access those portions as if they were arrays in memory.
MMIO can perform better than raw read() and write() calls regarding latency. A call to read() and write() involves two data transfers. One between the file and a buffer in the kernel, and the second between the kernel buffer to the user-land buffer. Using MMIO, we can save the second copy (from kernel buffer to user-land buffer). Using MMIO also saves memory because the kernel puts the data in a mapped page that the user accesses.

When to prefer the plain old file system API#

If we are sequentially reading a file, MMIO might not give any benefits over read() because the IO cost of moving data from disk to memory will incur in both cases.
Small IO operations using MMIO are likely more costly than the simpler read() and write() calls because the cost of setting up memory pages in the memory management unit (MMU) hardware—setting the access right, etc.—have cost.

MMIO in action#

We now write and read files using MMIO. The following code uses two functions (mmio_read() and mmio_write()) for reading and writing files. We have annotated the following code to provide information in context. Let’s read the code carefully and then run it to see MMIO in action.

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>
#include <string.h>
// function prototypes
int mmio_read(const char* filename);
int mmio_write(const char* filename, const char* text);
int main() {
    // You can change filename and text for experimentation
    const char* filename = "example.txt";
    const char* text = "Hello, mmap!";
    printf("[Step 1:] Reading a non-existing file via mmap. Should give an error.\n");
    printf("Error is printed on stderr (instead of stdout).\n");
    mmio_read(filename);
    printf("[Step 2:] Creating a new file and writing to it via mmap.\n");
    mmio_write(filename, text);
    printf("[Step 3:] Reading what we wrote in the previous step.\n");
    mmio_read(filename);
 
    return 0;
}
// writing using MMIO
int mmio_write(const char* filename,
               const char* text
              )
{
 
    // Open the file for writing
    //(mode_t)0600 means that the file will have read and write permissions 
    // for the owner of the file (the "0600" represents octal notation for permissions). 
    int fd = open(filename, O_RDWR | O_CREAT | O_TRUNC, (mode_t)0600);
    if (fd == -1) {
        perror("File openning failed.");
        return EXIT_FAILURE;
    }
    // Determine the size of the file
    off_t file_size = strlen(text);
    // Set the file size using ftruncate
    // Writing to a file region via MMIO which does not actually exist will generate
    // a sigbus fault.
    if (ftruncate(fd, file_size) == -1) {
        close(fd);
        perror("ftruncate failed.");
        return EXIT_FAILURE;
    }
    // Map the file into memory
    char* mapped_data = (char*)mmap(NULL, file_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    if (mapped_data == MAP_FAILED) {
        close(fd);
        perror("mmap");
        return EXIT_FAILURE;
    }
    // Copy the data into the mapped memory
    memcpy(mapped_data, text, file_size);
    // Flush changes to the file (optional)
    // msync has associated IO cost because data is forced to flush to persistent store.
    if (msync(mapped_data, file_size, MS_SYNC) == -1) {
        perror("msync");
    }
    // Unmap the memory
    if (munmap(mapped_data, file_size) == -1) {
        perror("munmap");
    }
    // Close the file
    close(fd);
    printf("Data has been written to %s\n", filename);
}
// reading using MMIO
int mmio_read(const char* filename)
{
    // Open the file for reading
    int fd = open(filename, O_RDONLY);
    if (fd == -1) {
        perror("File opening failed.");
        return EXIT_FAILURE;
    }
    // Determine the size of the file
    struct stat file_info;
    if (fstat(fd, &file_info) == -1) {
        close(fd);
        perror("fstat failed.");
        return EXIT_FAILURE;
    }
    off_t file_size = file_info.st_size;
    // Map the file into memory for reading
    char* mapped_data = (char*)mmap(NULL, file_size, PROT_READ, MAP_PRIVATE, fd, 0);
    if (mapped_data == MAP_FAILED) {
        close(fd);
        perror("mmap");
        return EXIT_FAILURE;
    }
    // Close the file (not needed after mapping)
    close(fd);
    // Now you can access the contents of the file using mapped_data
    // For example, printing the file contents:
    printf("File Contents:\n%s\n", mapped_data);
    // Unmap the memory
    if (munmap(mapped_data, file_size) == -1) {
        perror("munmap");
    }
}

Written By:

Abdul Qadeer

Free Resources

blog

What are REST APIs? HTTP API vs. REST API

blog

How does prompt engineering differ from traditional programming?

blog

10 common mistakes Python programmers make (and how to fix them)

Reading and writing files using memory-mapped I/O

Understanding memory-mapped IO#

When to prefer MMIO over file system API#

When to prefer the plain old file system API#

MMIO in action#

Conclusion#