IntelliJ: Debug and Inspect Spark Execution

Learn how to set up and debug the projects on an IntelliJ IDEA Integrated Development Environment (IDE).

Working locally

One advantageous property of using Spark as a Java library is that the code can be run and, more interestingly, debugged in real-time.

Debugging in particular is very valuable because it allows developers to conduct local live-testing, and to understand, through inspection of Spark objects, the nature of Spark-based applications.

This lesson demonstrates how to debug a Spark-based project, so let’s start right away.

Note: The lesson illustrates the necessary actions in IntelliJ IDEA IDE. Feel free to investigate plenty of options on the Web for debugging in Eclipse or NetBeans, if that happens to be the preferred “flavor” for a development environment.

Debugging the basic example

Let’s briefly recall the lesson “Running The First Spark Program."

To load the project from the code widget into IntelliJ IDEA IDE locally, the first step involves creating a simple Maven-based project, as described in the following JetBrains Documentation.

Then, the necessary dependencies from the widget’s pom.xml file can be copied into the local project file with the same name. This allows Maven to locate and download the necessary Spark artifacts (libraries). These dependencies are always listed within the following XML tag:

<dependencies>
//Dependencies listed within this XML element...
</dependencies>

Lastly, the project’s folder structure—that is, the way the files are organized into specific folders following a Maven convention for basic Java projects—has to be the same, so then the necessarily related classes can be found in the same Java packages defined in the code widget when execution takes place.

More information on this can be found at this Maven Docs link.

Note: If the folder structure of the project is changed, the codebase has to be reorganized, including the “Resources” folder containing the file(s) to be ingested by Spark.

Debugging projects

We can proceed to debug the main class of the application, named DataFrameBasicsMain, by placing first a breakpoint on the following code area:

Get hands-on with 1200+ tech skills courses.