Cluster Monitoring and SparkUI

Get introduced to a useful tool included and shipped with the Spark library: SparkUI.

Monitoring in SparkUI

The SparkUI is a user interface (UI) provided by the Spark libraries that allows the developer to query and inspect both the status of jobs and resources usage.

Note: By default found on port 4040, though configurable, the tool’s address is expressed as URL:PORT, where URL is the master node or cluster manager IP address.

When used and understood correctly, SparkUI is a powerful tool to detect bottlenecks in a Spark application and draw conclusions as to how resources are utilized (or underutilized.)

One advantage over reading the logs is that the information is displayed in real-time; that is, the different visualizations present in the tool allow the developer to monitor a running Spark application as well as querying finished applications’ historical information.

Nevertheless, a combination of log scrutiny and conclusions drawn from observation of the SparkUI screens can always provide a more comprehensive picture.

Let’s go through the different parts of the UI application by running one of the earliest projects used in this course, albeit with some code removed to make things simpler:

Get hands-on with 1200+ tech skills courses.