Cache Consistency

Let's look at the cache consistency problem with respect to the AFS.

We'll cover the following

When we discussed NFS, there were two aspects of cache consistency we considered: update visibility and cache staleness. With updated visibility, the question is: when will the server be updated with a new version of a file? With cache staleness, the question is: once the server has a new version, how long before clients see the new version instead of an older cached copy?

ASIDE: CACHE CONSISTENCY IS NOT A PANACEA

When discussing distributed file systems, much is made of the cache consistency the file systems provide. However, this baseline consistency does not solve all problems with regard to file access from multiple clients. For example, if you are building a code repository, with multiple clients performing check-ins and check-outs of code, you can’t simply rely on the underlying file system to do all of the work for you. Rather, you have to use explicit file-level locking in order to ensure that the “right” thing happens when such concurrent accesses take place. Indeed, any application that truly cares about concurrent updates will add extra machinery to handle conflicts. The baseline consistency described in this chapter and the previous one are useful primarily for casual usage, i.e., when a user logs into a different client, they expect some reasonable version of their files to show up there. Expecting more from these protocols is setting yourself up for failure, disappointment, and tear-filled frustration.

Because of callbacks and whole-file caching, the cache consistency provided by AFS is easy to describe and understand. There are two important cases to consider: consistency between processes on different machines, and consistency between processes on the same machine.

Consistency between processes on different machines

Between different machines, AFS makes updates visible at the server and invalidates cached copies at the exact same time, which is when the updated file is closed. A client opens a file and then writes to it (perhaps repeatedly). When it is finally closed, the new file is flushed to the server (and thus visible). At this point, the server then “breaks” callbacks for any clients with cached copies; the break is accomplished by contacting each client and informing it that the callback it has on the file is no longer valid. This step ensures that clients will no longer read stale copies of the file; subsequent opens on those clients will require a re-fetch of the new version of the file from the server (and will also serve to reestablish a callback on the new version of the file).

Consistency between processes on same machines

AFS makes an exception to this simple model between processes on the same machine. In this case, writes to a file are immediately visible to other local processes (i.e., a process does not have to wait until a file is closed to see its latest updates). This makes using a single machine behave exactly as you would expect, as this behavior is based upon typical UNIX semantics. Only when switching to a different machine would you be able to detect the more general AFS consistency mechanism.

Last writer wins

There is one interesting cross-machine case that is worthy of further discussion. Specifically, in the rare case that processes on different machines are modifying a file at the same time, AFS naturally employs what is known as a last writer wins approach (which perhaps should be called the last closer wins). Specifically, whichever client calls close() last will update the entire file on the server last and thus will be the “winning” file, i.e., the file that remains on the server for others to see. The result is a file that was generated in its entirety either by one client or the other. Note the difference from a block-based protocol like NFS: in NFS, writes of individual blocks may be flushed out to the server as each client is updating the file, and thus the final file on the server could end up as a mix of updates from both clients. In many cases, such a mixed file output would not make much sense, i.e., imagine a JPEG image getting modified by two clients in pieces; the resulting mix of writes would not likely constitute a valid JPEG.

Get hands-on with 1200+ tech skills courses.

Introduction

Virtualization: Processes

Virtualization: Process API

Virtualization: Direct Execution

Virtualization: CPU Scheduling

Virtualization: Multi-Level Feedback

Virtualization: Lottery Scheduling

Virtualization: Multi-CPU Scheduling

Virtualization: Address Space

Virtualization: Memory API

Virtualization: Address Translation

Virtualization: Segmentation

Virtualization: Free Space Management

Virtualization: Introduction to Paging

Virtualization: Translation Lookaside Buffers

Virtualization: Advanced Page Tables

Virtualization: Swapping: Mechanisms

Virtualization: Swapping: Policies

Virtualization: Complete VM Systems

Concurrency: Concurrency and Threads

Concurrency: Thread API

Concurrency: Locks

Concurrency: Locked Data Structures

Concurrency: Conditional Variables

Concurrency: Semaphores

Concurrency: Concurrency Bugs

Concurrency: Event-Based Concurrency

Persistence: I/O Devices

Persistence: Hard Disk Drives

Persistence: Redundant Disk Arrays (RAID)

Persistence: Files and Directories

Persistence: File System Implementation

Persistence: Fast File System

Persistence: FSCK and Journaling

Persistence: Log-Structured File System

Persistence: Flash-based SSDs

Persistence: Data Integrity and Protection

Distribution: Distributed Systems

Distribution: Network File System (NFS)

Distribution: Andrew File System (AFS)

Cache Consistency

Consistency between processes on different machines

Consistency between processes on same machines

Last writer wins