Deploying and Running a Spark Application

Explore the process of deploying and running Apache Spark applications in both client and cluster deployment modes. Understand the roles of cluster components, how to build uber jars with Maven Shade plugin, and execute Spark jobs using the spark-submit command. This lesson equips you with practical knowledge to configure, package, and execute Spark Java applications efficiently in cluster environments.

We'll cover the following...

Cluster’s components and interactions

Application deployment modes

Client deployment mode
Cluster deployment mode

Application’s execution flow

Building a Spark application

Building an application uber jar

Running the application on the cluster
Running a Spring Boot Spark application

Cluster’s components and interactions

The previous lesson introduced a diagram for the most relevant logical and physical parts in Spark cluster mode, and it provided a view of the different interactions between them.

However, to expand on how these interact when a Spark application runs in a cluster, it’s helpful to take a look at a somewhat more dynamic picture, which the below image (similar to the previous diagram) presents, and in which we can highlight the main parts and interactions:

First, it’s important to clarify that the entities on the diagram ultimately represent logical components.

All of these components run in different JVM processes, and they could run in different physical places too, depending on different deployment modes, which in turn impact the way the Spark application is configured and executed.

Application deployment modes

A Spark application is deployed to run in the cluster following different strategies, let’s briefly learn them.

Client deployment mode

Client deployment mode allows an application to be submitted from a machine in the cluster acting as a gateway or edge node; that is, a node in the cluster is used to submit the application, in which the driver process runs for the duration of execution.

There are two ways to do this, which are linked to the previous lesson’s cluster mode description:

For instance, if the standalone mode is the cluster mode of choice, then a master ...

1.Course Introduction

2.Spark Introduction and Basics

3.Getting Started with Spark

4.DataFrame Basic Operations

5.DataFrame Advanced Operations

6.Spark SQL and Other Functionalities

7.Building a Big Data Batch Application

8.Deployment and Cluster Execution

9.Monitoring and Performance Fundamentals

10.Conclusion

11.Apendix

Deploying and Running a Spark Application

Cluster’s components and interactions

Application deployment modes

Client deployment mode