Search⌘ K
AI Features

The Big Picture

Explore the architecture and core goals of HDFS including its design for storing large files, streaming data access, fault tolerance on commodity hardware, and data locality. Understand the roles of Namenode and Datanodes, how they interact with clients, and HDFS limitations such as latency and handling numerous small files.

We'll cover the following...

The Big Picture

In this lesson, we’ll discuss the architecture of HDFS, its goals, and its limitations. The Hadoop Distributed File System (HDFS) was designed with the following goals in mind:

  • Large files: The system should store large files comprising of several hundred gigabytes or petabytes.

  • Streaming data access: HDFS is optimized and built for a write-once and read-many-times pattern. Having the time to read the entire dataset is more important than the latency in reading the first record. HDFS doesn’t support multiple writers. Existing files on the system can only be appended to at the very end. Modifying a file at an arbitrary offset is not possible.

  • Commodity hardware: Hadoop is designed to run on clusters of cheap commodity hardware. It does not require expensive specialized hardware. The ...