The Problem: Poor Performance

This lesson discusses why the performance of UNIX's old file system was so poor.

The problem with the file system mentioned in the last lesson was that performance was terrible. As measured by Kirk McKusick and his colleagues at Berkeley“A Fast File System for UNIX” by Marshall K. McKusick, William N. Joy, Sam J. Leffler, Robert S. Fabry. ACM TOCS, 2:3, August 1984. McKusick was recently honored with the IEEE Reynold B. Johnson award for his contributions to file systems, much of which was based on his work building FFS. In his acceptance speech, he discussed the original FFS software: only 1200 lines of code! Modern versions are a little more complex, e.g., the BSD FFS descendant now is in the 50-thousand lines-of-code range., performance started off bad and got worse over time, to the point where the file system was delivering only 2% of overall disk bandwidth!

The main issue was that the old UNIX file system treated the disk like it was a random-access memory. The data was spread all over the place without regard to the fact that the medium holding the data was a disk, and thus had real and expensive positioning costs. For example, the data blocks of a file were often very far away from its inode, thus inducing an expensive seek whenever one first read the inode and then the data blocks of a file (a pretty common operation).

Worse, the file system would end up getting quite fragmented, as the free space was not carefully managed. The free list would end up pointing to a bunch of blocks spread across the disk, and as files got allocated, they would simply take the next free block. The result was that a logically contiguous file would be accessed by going back and forth across the disk, thus reducing performance dramatically.

For example, imagine the following data block region, which contains four files (A, B, C, and D), each of size 2 blocks:

Get hands-on with 1200+ tech skills courses.

Introduction

Virtualization: Processes

Virtualization: Process API

Virtualization: Direct Execution

Virtualization: CPU Scheduling

Virtualization: Multi-Level Feedback

Virtualization: Lottery Scheduling

Virtualization: Multi-CPU Scheduling

Virtualization: Address Space

Virtualization: Memory API

Virtualization: Address Translation

Virtualization: Segmentation

Virtualization: Free Space Management

Virtualization: Introduction to Paging

Virtualization: Translation Lookaside Buffers

Virtualization: Advanced Page Tables

Virtualization: Swapping: Mechanisms

Virtualization: Swapping: Policies

Virtualization: Complete VM Systems

Concurrency: Concurrency and Threads

Concurrency: Thread API

Concurrency: Locks

Concurrency: Locked Data Structures

Concurrency: Conditional Variables

Concurrency: Semaphores

Concurrency: Concurrency Bugs

Concurrency: Event-Based Concurrency

Persistence: I/O Devices

Persistence: Hard Disk Drives

Persistence: Redundant Disk Arrays (RAID)

Persistence: Files and Directories

Persistence: File System Implementation

Persistence: Fast File System

Persistence: FSCK and Journaling

Persistence: Log-Structured File System

Persistence: Flash-based SSDs

Persistence: Data Integrity and Protection

Distribution: Distributed Systems

Distribution: Network File System (NFS)

Distribution: Andrew File System (AFS)

The Problem: Poor Performance