Requirements of Spark

Explore the functional and non-functional requirements of Spark, including data processing capabilities, latency optimization, fault tolerance, and memory management. Learn how Spark handles large datasets with efficient partitioning and worker estimations, enabling robust and scalable iterative computations. Understand how these requirements translate into designing a system capable of high throughput and fault resilience in distributed data processing.

We'll cover the following...

Functional requirements
Non-functional requirements
Estimations

Let's understand the functional and non-functional requirements of Spark.

Functional requirements

The functional requirements of Spark are listed below:

Data processing: The system needs to process a large working dataset efficiently and also be able to do it repeatedly for iterative or interactive queries.
Latency and throughput: Our system should achieve low latency and high throughput for the tasks, like iterative data processing (where we use the same data repeatedly) and performing ad hoc queries on the same dataset. For example, we expect our system to query many terabytes of data in a few seconds. Usually, the first run is slower than the others because our system needs to load the data from the disks that involve IO operations.

1.Prologue

2.File Systems

3.Google File System (GFS)

4.Google Colossus File System

5.Facebook's Tectonic File System

6.Databases

7.Google Bigtable

8.Google Megastore

9.Google Spanner

10.Key-value Stores

11.Many-core Key-value Store

12.Scaling Memcache

13.SILT

14.Amazon DynamoDB

15.Concurrency Management

16.Two-phase Locking (2PL)

17.Google Chubby Locking Service

18.ZooKeeper

19.Big Data Processing: Batch to Stream Processing

20.MapReduce

21.Spark

22.Kafka

23.Consensus

24.Understanding Consensus: Two Generals, FLP, & Byzantine Generals

25.Two-phase Commit

26.State Machine Replication

27.Paxos

28.Raft

29.Epilogue

Requirements of Spark

Functional requirements

Non-functional requirements