Architecture

Explore the fundamental architecture of Spark, including the driver and executor roles, cluster managers, and resource allocation. Understand how Spark runs in local, standalone, YARN, and Kubernetes modes, and the difference between cluster and client deployment to manage big data processing effectively.

We'll cover the following...

Spark design
Driver
Executor

Cluster manager
Execution modes
Cluster Mode:
Client Mode:

Spark design

Spark is a distributed parallel data-processing framework and bears many similarities to the traditional MapReduce framework. Spark has the same leader-worker architecture as MapReduce, the leader process coordinates and distributes work to be performed among work processes. These two kinds of processes are formally called the driver and the executor.

Driver

The driver is the leader process that manages the execution of a Spark job. It is responsible for maintaining the overall state of the Spark application, responding to a user's program or input and analyzing, distributing and scheduling work among executor processes. The driver process is in essence the heart of the Spark application and maintains all application related information during an application's lifetime.

Spark Driver converts Spark operations into DAG computations and schedules and ...

1.Spark Overview

2.DataFrames

3.Datasets

4.Spark SQL

5.Summary

Architecture

Spark design

Driver