Running Spark Applications

Explore how to run Spark applications by creating SparkSession and SparkContext programmatically. Understand different deployment modes like client and cluster on YARN, and learn to monitor jobs using the Spark History Server for effective Spark job management.

We'll cover the following...

Running Spark Applications
Spark History Server

Running Spark Applications

In previous lessons, when we fired-up the spark-shell, we interacted with an object of type SparkSession, represented by the variable spark. Starting with Spark 2.0, SparkSession is the single-unified entry point to manipulate data with Spark. There’s a one-to-one correspondence between a Spark application and a SparkSession. Each Spark application is associated with one SparkSession. SparkSession has another field:SparkContext which represents the connection to the Spark Cluster. The SparkContext can create RDDs, accumulators, broadcast variables and run code on the cluster.

The illustration below shows how Spark interacts with and runs jobs on a Hadoop cluster.

1.Hadoop

2.YARN

3.Map Reduce

4.HDFS

5.Spark

6.Input & Output Formats

7.Misc

8.Quiz

9.Reference: Replication

10.Reference: Partitioning

11.Reference: Transactions

12.Reference: Issues in Distributed Systems

Mock Interview

Running Spark Applications

Running Spark Applications