Evaluation

The use cases of our system span various problems, ranging from processing raw data to gaining insights from the derived data, but we can evaluate our design to fulfill the requirements for general use.

The MapReduce library inherently handles the mechanisms of automatic parallelization, data splitting, load balancing, and fault tolerance. Let’s see how our design fulfills additional non-functional requirements.

Fault tolerance

Since we deal with large datasets, fault tolerance is critical. Faults can happen at any stage or component. Let’s see each one in detail.

  1. Worker failure: Let’s consider the scenario of a failed worker. The manager identifies any worker not responding to the periodic calls as a failure. Once the manager declares a worker as a failure, it reschedules all of its completed and in-progress tasks to another available worker (if the actual disk or server fails and is unreachable, and Reduce tasks haven’t fetched out completed map outputs yet, then the manager will need to get them rescheduled. If only the worker process fails, the manager will only need to reschedule the in-progress work). When the reassigned work is completed, the manager notifies all the reducers working on the processed data of the failed mapper and assigns them the new mapper address to fetch data. The manager also wipes down all the processed data by the failed mapper from its local disk to free up space.

    In case of a reducer failure, the manager reassigns its failed tasks to another reducer and provides the new reducer with all the required information for data fetching.

Level up your interview prep. Join Educative to access 70+ hands-on prep courses.