Spark Application - An Example

Work through an example Spark application and explore the web UIs of Spark and Spark History Server.

We'll cover the following

The quintessential example for big data has always been the word count application, which computes the number of times a word appears in a large text file. We’ll deviate from that example and instead use movies data to compute the number of movies that were sequels. Before we move forward, the columns for the data file that we’ll read-in and process are presented below for easy reference:

imdbId title releaseYear releaseDate genre writers actors directors sequel hitFlop

If the movie was a sequel to a previous movie then the column sequel’s value is set to 1. Our task requires us to read the data file and then run a query to count the rows with the column sequel set to 1.

Our Scala program below shows how to compute the number of rows in a data set that have the sequel column set to 1. Go through the code and comments below to explore the various statements.

Get hands-on with 1200+ tech skills courses.