This chapter sets the stage for the course, emphasizing learning from historical systems and balancing innovation with established design practices.
This chapter sets the stage for exploring distributed file systems, focusing on advancements in data management with systems like GFS, Colossus, and Tectonic.
3.
Google File System (GFS)
11 Lessons
This chapter covers the Google File System (GFS), focusing on efficient management of large data files with scalability, fault tolerance, and high throughput.
4.
Google Colossus File System
3 Lessons
This chapter covers Colossus, which improves scalability and performance over GFS using a distributed metadata model for better data management and low latency.
5.
Facebook's Tectonic File System
8 Lessons
This chapter discusses Tectonic File System, providing scalable storage with performance isolation and optimized resource management for diverse workloads.
This chapter covers the evolution from relational to NoSQL databases, highlighting the balance between scalability, availability, and consistency.
Introduction to Distributed DatabasesThis chapter covers Bigtable, a scalable storage solution for managing large datasets, enhancing performance and availability with its unique design.
Introduction to BigtableData Model of BigtableDetailed Design of Bigtable: Part IDetailed Design of Bigtable: Part IIDesign Refinements in BigtableEvaluation of BigtableQuiz on Bigtable8.
Google Megastore
6 Lessons
This chapter covers Megastore, blending NoSQL scalability with relational features for high availability, ACID transactions, and optimized cloud performance.
Introduction to MegastoreHigh-level Design for Better Availability and ScalabilityData Model of MegastoreReplication in MegastoreEvaluation of MegastoreQuiz on MegastoreThis chapter covers Google Spanner, combining relational features with NoSQL scalability for strong consistency, high availability, and global data management.
Introduction to SpannerDetailed Design of SpannerDatabase Buckets and Data Model of SpannerTrueTime API in SpannerSpanner, TrueTime, and the CAP TheoremConcurrency Control in SpannerDatabase Operations in SpannerEvaluation of SpannerQuiz on SpannerThis chapter introduces key-value stores, crucial for caching, NoSQL databases, and enhancing scalability and availability in modern distributed applications.
Introduction to Key-value Stores11.
Many-core Key-value Store
5 Lessons
This chapter covers the many-core key-value store, enhancing efficiency and scalability while addressing power consumption and performance challenges.
Motivation and Requirements for a Many-core ApproachEstimations and Limitations of a Many-core SystemDetailed Design of a Many-core SystemEvaluation of the Many-core SystemQuiz on Many-core Systems12.
Scaling Memcache
7 Lessons
This chapter explores Memcache scaling strategies, addressing performance, consistency, and network efficiency challenges across various operational levels.
Introduction to Scaling MemcacheSingle-server Level of MemcacheCluster Level of MemcacheRegional Level of MemcacheCross-regional Level of Memcache Evaluation of MemcacheQuiz on MemcacheThis chapter covers SILT, which optimizes key-value storage with a multi-store architecture, focusing on memory efficiency, low latency, and data management.
Introduction to SILTHigh-level Design of SILTA Write-friendly Store for SILT: Part IA Write-friendly Store for SILT: Part IIA Write-friendly Store for SILT: Part IIIIntermediary Store(s) in SILTA Memory-efficient Store for SILT: Part IA Memory-efficient Store for SILT: Part IIA Memory-efficient Store for SILT: Part IIIRequest Flows in SILTEvaluating and Extending the Design of SILTQuiz on SILTThis chapter covers DynamoDB, a managed NoSQL service designed for high availability, strong durability, and scalability, meeting diverse data management needs.
Introduction to DynamoDBHigh-level Design of DynamoDBNo Fixed Schema in DynamoDBPartitioning and Replication in DynamoDBAdapting to Traffic Patterns in DynamoDBDurability and Correctness in DynamoDBEnsuring High Availability in DynamoDBQuiz on DynamoDB15.
Concurrency Management
1 Lesson
This chapter introduces concurrency management methods for efficiently handling simultaneous client requests in distributed systems.
Introduction to Concurrency Management16.
Two-phase Locking (2PL)
3 Lessons
This chapter covers 2PL, a concurrency control mechanism ensuring data integrity, while addressing challenges like deadlocks and throughput issues.
Introduction to Two-Phase Locking (2PL)Analysis and Evaluation of Two-Phase Locking (2PL)Quiz on 2PL17.
Google Chubby Locking Service
8 Lessons
This chapter covers Chubby, a distributed locking service that enhances coordination, availability, and fault tolerance in Google’s systems with robust design.
Introduction to ChubbyDetailed Design of Chubby: Part IDetailed Design of Chubby: Part IIDetailed Design of Chubby: Part IIIDetailed Design of Chubby: Part IVThe Rationale Behind Chubby’s DesignEvaluation of ChubbyQuiz on ChubbyThis chapter covers ZooKeeper, a coordination system for distributed environments, offering efficient resource management and high availability.
Introduction to ZooKeeperDetailed Design of ZooKeeperPrimitives of ZooKeeperEvaluation of ZooKeeperQuiz on ZooKeeper19.
Big Data Processing: Batch to Stream Processing
1 Lesson
This chapter explores the evolution and significance of big data processing systems like MapReduce, Spark, and Kafka in data handling and management.
Introduction to Big Data Processing SystemsThis chapter covers MapReduce, which simplifies processing large datasets with a user-friendly model that enables efficient parallelization and fault tolerance.
System Design: MapReduceHigh-level Design of MapReduceMapReduce: Detailed DesignDesign Refinements in MapReduce: Part IDesign Refinements in MapReduce: Part IIMapReduce: EvaluationConcluding MapReduceQuiz on MapReduceThis chapter covers Spark's architecture, focusing on in-memory processing, RDDs, and features for low latency and fault tolerance.
Introduction to SparkRequirements of SparkHigh-level Design of SparkResilient Distributed Datasets of SparkParallel Operations in SparkShared Variables in SparkDetailed Design of SparkRefinements in SparkEvaluation of SparkQuiz on SparkThis chapter introduces Kafka, a powerful messaging system for real-time event streaming, known for high scalability, efficiency, and reliable data delivery.
Introduction to KafkaHigh-level Design of KafkaDetailed Design of KafkaEfficiency of KafkaDistributed Coordination in KafkaDelivery Guarantees of KafkaEvaluation of KafkaQuiz on KafkaThis chapter introduces consensus in distributed systems, covering algorithms like Paxos and Raft, and key concepts like FLP and Byzantine faults.
Introduction to Consensus in Distributed Systems24.
Understanding Consensus: Two Generals, FLP, & Byzantine Generals
4 Lessons
This chapter explores consensus challenges in distributed systems, focusing on the Two Generals problem, FLP impossibility, and Byzantine Generals problem.
Consensus Prerequisites and Two Generals' ProblemFLP ImpossibilityThe Byzantine Generals ProblemLet AI Evaluate Your Understanding of Consensus Fundamentals25.
Two-phase Commit
4 Lessons
This chapter explains 2PC, a consensus protocol to ensure atomicity in distributed transactions by coordinating across nodes and handling failure challenges.
Introduction to Two-Phase Commit (2PC)Working of the Two-Phase Commit ProtocolFailures in the Two-Phase Commit ProtocolQuiz on Two-Phase Commit26.
State Machine Replication
10 Lessons
This chapter covers State Machine Replication, which ensures fault tolerance by using replicated state machines to maintain consistency despite failures.
Introduction to State Machine ReplicationState MachinesReplication and Coordination of State MachinesOrdering Requests: Part IOrdering Requests: Part IIFault Tolerance for Outputs and ClientsProtocols for Maintaining Fault Tolerance: Part IProtocols for Maintaining Fault Tolerance: Part IISMR in Practice Via a LogQuiz on State Machine ReplicationThis chapter explores the Paxos consensus algorithm, detailing its design, operation, and use in achieving reliable distributed consensus.
Introduction to PaxosBasic Paxos Protocol DesignBasic Paxos in ActionThe Rationale behind Paxos Design ChoicesMulti-PaxosQuiz on PaxosThis chapter covers Raft, a consensus algorithm ensuring consistency and fault tolerance through leader election, log replication, and cluster management.
Introduction to RaftRaft's Basics and High-Level WorkflowRaft's Leader Election ProtocolRaft's Log Replication ProtocolRaft's Safety, Fault-Tolerance, and Availability ProtocolsRaft's Cluster Membership ChangesLog Compaction and Client Interaction in RaftQuiz on RaftThis chapter concludes the course by emphasizing applying system design principles to real-world challenges while encouraging ongoing exploration and learning.
Conclusion