On to NFS

In this lesson, we look at one of the earliest distributed systems, the NFS.

One of the earliest and quite successful distributed systems was developed by Sun Microsystems, and is known as the Sun Network File System (or NFS)“The Sun Network File System: Design, Implementation and Experience” by Russel Sandberg. USENIX Summer 1986. The original NFS paper; though a bit of a challenging read, it is worthwhile to see the source of these wonderful ideas.. In defining NFS, Sun took an unusual approach: instead of building a proprietary and closed system, Sun instead developed an open protocol that simply specified the exact message formats that clients and servers would use to communicate. Different groups could develop their own NFS servers and thus compete in an NFS marketplace while preserving interoperability. It worked: today there are many companies that sell NFS servers (including Oracle/Sun, NetApp“File System Design for an NFS File Server Appliance” by Dave Hitz, James Lau, Michael Malcolm. USENIX Winter 1994. San Francisco, California, 1994. Hitz et al. were greatly influenced by previous work on log-structured file systems, EMC, IBM, and others), and the widespread success of NFS is likely attributed to this “open market” approach.

Focus: simple and fast server crash recovery

In this chapter, we will discuss the classic NFS protocol (version 2, a.k.a. NFSv2), which was the standard for many years; small changes were made in moving to NFSv3, and larger-scale protocol changes were made in moving to NFSv4. However, NFSv2 is both wonderful and frustrating and thus serves as our focus.

In NFSv2, the main goal in the design of the protocol was simple and fast server crash recovery. In a multiple-client, single-server environment, this goal makes a great deal of sense; any minute that the server is down (or unavailable) makes all the client machines (and their users) unhappy and unproductive. Thus, as the server goes, so goes the entire system.


Before getting into the details of the NFSv2 protocol, you might be wondering: why do servers crash? Well, as you might guess, there are plenty of reasons. Servers may simply suffer from a power outage (temporarily); only when power is restored can the machines be restarted. Servers are often composed of hundreds of thousands or even millions of lines of code; thus, they have bugs (even good software has a few bugs per hundred or thousand lines of code), and thus they eventually will trigger a bug that will cause them to crash. They also have memory leaks; even a small memory leak will cause a system to run out of memory and crash. And, finally, in distributed systems, there is a network between the client and the server. If the network acts strangely (for example, if it becomes partitioned and clients and servers are working but cannot communicate), it may appear as if a remote machine has crashed, but in reality, it is just not currently reachable through the network.

Get hands-on with 1200+ tech skills courses.