Using Checksums

This lesson discusses how checksums are used in disks to detect disk corruption.

With a checksum layout decided upon, we can now proceed to actually understand how to use the checksums. When reading a block $D$ , the client (i.e., file system or storage controller) also reads its checksum from disk $C_s(D)$ , which we call the stored checksum (hence the subscript $C_s$ ). The client then computes the checksum over the retrieved block $D$ , which we call the computed checksum $C_c(D)$ . At this point, the client compares the stored and computed checksums; if they are equal (i.e., $C_s(D) == C_c(D)$ , the data has likely not been corrupted and thus can be safely returned to the user. If they do not match (i.e., $C_s(D) != C_c(D)$ ), this implies the data has changed since the time it was stored (since the stored checksum reflects the value of the data at that time). In this case, we have a corruption, which our checksum has helped us to detect.

Given a corruption, the natural question is what should we do about it? If the storage system has a redundant copy, the answer is easy: try to use it instead. If the storage system has no such copy, the likely answer is to return an error. In either case, realize that corruption detection is not a magic bullet; if there is no other way to get the non-corrupted data, you are simply out of luck.

Get hands-on with 1200+ tech skills courses.

Introduction

Virtualization: Processes

Virtualization: Process API

Virtualization: Direct Execution

Virtualization: CPU Scheduling

Virtualization: Multi-Level Feedback

Virtualization: Lottery Scheduling

Virtualization: Multi-CPU Scheduling

Virtualization: Address Space

Virtualization: Memory API

Virtualization: Address Translation

Virtualization: Segmentation

Virtualization: Free Space Management

Virtualization: Introduction to Paging

Virtualization: Translation Lookaside Buffers

Virtualization: Advanced Page Tables

Virtualization: Swapping: Mechanisms

Virtualization: Swapping: Policies

Virtualization: Complete VM Systems

Concurrency: Concurrency and Threads

Concurrency: Thread API

Concurrency: Locks

Concurrency: Locked Data Structures

Concurrency: Conditional Variables

Concurrency: Semaphores

Concurrency: Concurrency Bugs

Concurrency: Event-Based Concurrency

Persistence: I/O Devices

Persistence: Hard Disk Drives

Persistence: Redundant Disk Arrays (RAID)

Persistence: Files and Directories

Persistence: File System Implementation

Persistence: Fast File System

Persistence: FSCK and Journaling

Persistence: Log-Structured File System

Persistence: Flash-based SSDs

Persistence: Data Integrity and Protection

Distribution: Distributed Systems

Distribution: Network File System (NFS)

Distribution: Andrew File System (AFS)

Using Checksums