The Linux Virtual Memory System: The Page Cache

In this lesson, you'll learn how the Linux​ page cache stores data and uses the 2Q replacement policy to free up space.

To reduce costs of accessing persistent storage (the focus of the third part of this course), most systems use aggressive caching subsystems to keep popular data items in memory. Linux, in this regard, is no different than traditional operating systems.

How linux page cache works

The Linux page cache is unified, keeping pages in memory from three primary sources: memory-mapped files, file data and metadata from devices (usually accessed by directing read() and write() calls to the file system), and heap and stack pages that comprise each process (sometimes called anonymous memory, because there is no named file underneath of it, but rather swap space). These entities are kept in a page cache hash table, allowing for quick lookup when said data is needed.

The page cache tracks if entries are clean (read but not updated) or dirty (a.k.a., modified). Dirty data is periodically written to the backing store (i.e., to a specific file for file data, or to swap space for anonymous regions) by background threads (called pdflush), thus ensuring that modified data eventually is written back to persistent storage. This background activity either takes place after a certain time period or if too many pages are considered dirty (both configurable parameters).

2Q replacement policy

In some cases, a system runs low on memory, and Linux has to decide which pages to kick out of memory to free up space. To do so, Linux uses a modified form of 2Q replacement“2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm” by T. Johnson, D. Shasha. VLDB ’94, Santiago, Chile. A simple but effective approach to building page replacement., which we describe here.

The basic idea is simple: standard LRU replacement is effective, but can be subverted by certain common access patterns. For example, if a process repeatedly accesses a large file (especially one that is nearly the size of memory, or larger), LRU will kick every other file out of memory. Even worse: retaining portions of this file in memory isn’t useful, as they are never re-referenced before getting kicked out of memory.

The Linux version of the 2Q replacement algorithm solves this problem by keeping two lists, and dividing memory between them. When accessed for the first time, a page is placed on one queue (called A1 in the original paper, but the inactive list in Linux); when it is re-referenced, the page is promoted to the other queue (called Aq in the original, but the active list in Linux). When replacement needs to take place, the candidate for replacement is taken from the inactive list. Linux also periodically moves pages from the bottom of the active list to the inactive list, keeping the active list to about two-thirds of the total page cache sizeUnderstanding the Linux Virtual Memory Manager” by M. Gorman. Prentice Hall, 2004. An in-depth look at Linux VM, but alas a little out of date..

Linux would ideally manage these lists in perfect LRU order, but, as discussed in earlier chapters, doing so is costly. Thus, as with many OSes, an approximation of LRU (similar to clock replacement) is used.

This 2Q approach generally behaves quite a bit like LRU, but notably handles the case where a cyclic large-file access occurs by confining the pages of that cyclic access to the inactive list. Because said pages are never re-referenced before getting kicked out of memory, they do not flush out other useful pages found in the active list.

ASIDE: THE UBIQUITY OF MEMORY-MAPPING

Memory mapping predates Linux by some years, and is used in many places within Linux and other modern systems. The idea is simple: by calling mmap() on an already opened file descriptor, a process is returned a pointer to the beginning of a region of virtual memory where the contents of the file seem to be located. By then using that pointer, a process can access any part of the file with a simple pointer dereference.

Accesses to parts of a memory-mapped file that have not yet been brought into memory trigger page faults, at which point the OS will page in the relevant data and make it accessible by updating the page table of the process accordingly (i.e., demand paging).

Every regular Linux process uses memory-mapped files, even the code in main() does not call mmap() directly, because of how Linux loads code from the executable and shared library code into memory. Below is the (highly abbreviated) output of the pmap command line tool, which shows what different mapping comprise the virtual address space of a running program (the shell, in this example, tcsh). The output shows four columns: the virtual address of the mapping, its size, the protection bits of the region, and the source of the mapping:

0000000000400000 372K r-x-- tcsh
00000000019d5000 1780K rw---   [anon ]
00007f4e7cf06000 1792K r-x-- libc-2.23.so
00007f4e7d2d0000 36K r-x-- libcrypt2.23.so
00007f4e7d508000  148K r-x-- libtinfo.so.5.9
00007f4e7d731000  152K r-x-- ld-2.23.so
00007f4e7d932000  16K rw---   [stack ]

As you can see from this output, the code from the tcsh binary, as well as code from libc, libcrypt, libtinfo, and code from the dynamic linker itself (ld.so) are all mapped into the address space. Also present are two anonymous regions, the heap (the second entry, labeled anon) and the stack (labeled stack). Memory-mapped files provide a straight- forward and efficient way for the OS to construct a modern address space.

Get hands-on with 1200+ tech skills courses.