Deploying and Running a Spark Application
Explore the process of deploying and running Apache Spark applications in both client and cluster deployment modes. Understand the roles of cluster components, how to build uber jars with Maven Shade plugin, and execute Spark jobs using the spark-submit command. This lesson equips you with practical knowledge to configure, package, and execute Spark Java applications efficiently in cluster environments.
We'll cover the following...
Cluster’s components and interactions
The previous lesson introduced a diagram for the most relevant logical and physical parts in Spark cluster mode, and it provided a view of the different interactions between them.
However, to expand on how these interact when a Spark application runs in a cluster, it’s helpful to take a look at a somewhat more dynamic picture, which the below image (similar to the previous diagram) presents, and in which we can highlight the main parts and interactions:
First, it’s important to clarify that the entities on the diagram ultimately represent logical components.
All of these components run in different JVM processes, and they could run in different physical places too, depending on different deployment modes, which in turn impact the way the Spark application is configured and executed.
Application deployment modes
A Spark application is deployed to run in the cluster following different strategies, let’s briefly learn them.
Client deployment mode
Client deployment mode allows an application to be submitted from a machine in the cluster acting as a gateway or edge node; that is, a node in the cluster is used to submit the application, in which the driver process runs for the duration of execution.
There are two ways to do this, which are linked to the previous lesson’s cluster mode description:
For instance, if the standalone mode is the cluster mode of choice, then a master ...