Introduction

Explore the complexities of writing code for distributed systems that run across multiple networked computers. Understand partial failures, fault tolerance, and the trade-offs between supercomputers and commodity hardware clusters. Gain insights into designing reliable software atop unreliable infrastructure.

We'll cover the following...

Introduction
Building Large Scale Systems

Introduction

Writing code on a single node is fairly straightforward but the moment we switch to writing code that runs on multiple computers connected by a network (distributed systems), the number of ways faults and failures can occur is numerous, nondeterministic and unpredictable. For example:

Misconfiguration of network switches
Accidental power cycles
Power distribution unit (PDU) failures
Backbone failures for the entire datacenter
Power failure for the entire datacenter

Distributed systems also suffer ...

1.Hadoop

2.YARN

3.Map Reduce

4.HDFS

5.Spark

6.Input & Output Formats

7.Misc

8.Quiz

9.Reference: Replication

10.Reference: Partitioning

11.Reference: Transactions

12.Reference: Issues in Distributed Systems

Mock Interview

Introduction

Introduction