The Big Picture

In this lesson, we’ll discuss the architecture of HDFS, its goals, and its limitations. The Hadoop Distributed File System (HDFS) was designed with the following goals in mind:

Large files: The system should store large files comprising of several hundred gigabytes or petabytes.
Streaming data access: HDFS is optimized and built for a write-once and read-many-times pattern. Having the time to read the entire dataset is more important than the latency in reading the first record. HDFS doesn’t support multiple writers. Existing files on the system can only be appended to at the very end. Modifying a file at an arbitrary offset is not possible.
Commodity hardware: Hadoop is designed to run on clusters of cheap commodity hardware. It does not require expensive specialized hardware. The ...

Hadoop

YARN

Map Reduce

HDFS

Spark

Input & Output Formats

Misc

Quiz

Reference: Replication

Reference: Partitioning

Reference: Transactions

Reference: Issues in Distributed Systems

The Big Picture

The Big Picture