Tools to Develop in Spark Locally

Learn how to set a local development environment to work with Spark.

Technology stack

Throughout the course, we use code widgets with embedded projects. These projects can be created or reproduced by importing the widget codebase into a local development environment, which requires the following necessary tools to be installed beforehand:

  • Apache Spark 3.0: Core technology of this course that includes all the APIs needed to develop a Spark-backed application in either Java or Scala.

Note: Strictly speaking, and for the examples of this course, if the projects are created as Maven projects, then it is not necessary to install Spark.

  • OS: Any modern operating system, like Windows, Linux, macOS, and so on, works.

  • Java 8: JDK 8 is needed to code in this course, though the JRE might only be required if we are just interested in running the artifacts for each code example (the jar file is built by the Java compiler -javac- or a tool such as Maven).

  • Apache Maven: The software project management tool used in the course. Version 3.5 onwards works fine with the projects and examples developed throughout this course.

  • Git: Versioning tool used to version, backup, and organize changes around the code in projects shared by one or multiple development teams.

  • JetBrains IntelliJ IDEA: Tool to work locally with the course code. The integrated development environmentrecommended is the free version of IDEA IntelliJ (Community Edition).

Installation links

This is an intermediate course, so it is reasonable to assume the learner should have no issues going through the following links and installing all the necessary technologies listed in the previous chapters locally.

All these technologies are a solid starting point to dive into Spark development on any laptop, regardless of the OS of choice. Even if the version for the technologies presented changes in the future, they provide a solid stack to start experimenting and playing around with Spark.

Some tutorials on how to install them are provided below.

Java 8 JDK

The Java Development Toolkit or JDK can be downloaded from Oracle’s official page: Oracle’s Download Page JDK 8

Apache Spark 3.0

Note: This course is designed to use Spark as a Maven dependency in projects that run locally in standalone Mode. Furthermore, Spark being a dependency or library in a Maven project should make it seamless to run the applications in a cluster, provided the application is properly packaged also through Maven.

Apache Maven

Maven can be downloaded from the following link, Maven Official Download page, and the official tutorial on how to install it can be found here:

Git

Git is pretty forward to download, as explained in the Official Git Download Page for All Platforms.

Basic knowledge of Git is helpful, but the necessary commands to work with Git and the projects are provided in the relevant lessons.

Jetbrains IntelliJ IDEA

A non-paid, community edition IDE, available for each platform. It can be downloaded at this link: IDEA Community Edition

  • The following tutorial, “Run a Java Program on IntelliJ” should also offer a walkthrough of the bare minimum setup to work with Java-based projects.

All these tools installed should get any developer up and running to begin coding, debugging, and experimenting with the Spark ecosystem in the comfort of a dedicated machine, if needed.

Get hands-on with 1200+ tech skills courses.