Search⌘ K
AI Features

Architecture

Explore Spark's distributed architecture, including the roles of the driver and executors. Understand how Spark interacts with cluster managers like YARN and the differences between cluster and client execution modes to manage and run Big Data jobs efficiently.

Architecture

Spark is a distributed parallel data-processing framework and bears many similarities to the traditional MapReduce framework. Spark has the same master-slave architecture as MapReduce, where one process, the master, coordinates and distributes work among slave processes. These two processes are formally called:

  • Driver
  • Executor

Driver

The driver is the master process that manages the execution of a Spark job. It is responsible for maintaining the overall state of the Spark application, ...