Block Replication

This lesson discusses how HDFS replicates data blocks across the cluster for redundancy.

Block Replication

Before discussing what makes HDFS fault-tolerant and highly available, let’s first understand the term fault tolerance. Wikipedia defines fault tolerance as the property that enables a system to continue operating properly in the event of the failure of some of its components. Said another way, fault tolerant computing is a form of full hardware redundancy. Two (or more) systems operate in tandem, mirroring identical applications and executing instructions in lock step. When a hardware failure occurs in the primary system, the secondary system running an identical application simultaneously takes over with no loss of service and zero downtime.

The secret sauce behind HDFS’s ability to withstand corrupted or lost data is the replication of data blocks. If a file comprises of 4 HDFS blocks and the replication factor is 3, then each data block would have 3 copies of itself. These copies spread out in the cluster on physically separate machines, are a total of 12 blocks. Replication ensures that if one data block becomes corrupted or hardware failure occurs, then the read request can still be fulfilled by another available clone of the block. This setup allows for self-healing. A lost block due to corruption or machine failure can be replicated to other live machines by making a copy of the healthy clone. The replication factor is controlled by the property dfs.replication. Some applications use a higher replication factor for the data blocks of a popular file to better distribute the read load across the cluster.

Selection of nodes for replication

When it comes to the placement of replicated data blocks in a cluster, there are a spectrum of possibilities. On one end of the spectrum, all three data blocks can be placed on one node. Or all three can be placed in separate data-centers. In the former case, the write bandwidth is minimized because the replicas are written to the same node. No redundancy is realized because if the single node goes down, all replicas are lost. In the latter case, redundancy is maximized at the cost of bandwidth.

In practice, Hadoop places the first replica on the same node as the client. If the client is runnning outside the cluster, a node is chosen at random. The second replica is placed on a randomly chosen rack different than the first replica. The third replica is placed on a randomly chosen node on the same rack as the second replica. Any further replicas are placed on randomly selected nodes without placing too many replicas in the same rack.

An pictorial depiction of how a block gets replicated on a two rack cluster with eight machines is shown below:

Get hands-on with 1200+ tech skills courses.