Architecture

This lesson describes the architecture of Spark.

Architecture

Spark is a distributed parallel data-processing framework and bears many similarities to the traditional MapReduce framework. Spark has the same master-slave architecture as MapReduce, where one process, the master, coordinates and distributes work among slave processes. These two processes are formally called:

  • Driver
  • Executor

Driver

The driver is the master process that manages the execution of a Spark job. It is responsible for maintaining the overall state of the Spark application, responding to a user’s program or input and analyzing, distributing and scheduling work among executor processes. The driver process is the heart of the Spark application and maintains all application related information during an application’s lifetime.

Executor

Executors are the slave processes that execute the code assigned to them by the driver process. They report the state of the computation on that executor back to the driver.

Get hands-on with 1200+ tech skills courses.