Spark API

This lesson introduces data abstractions available in Spark.

We'll cover the following

Spark API

Spark offers APIs and data abstractions that significantly enhance the developer experience. The original Spark paper describes the low-level abstraction called RDD. Later, others like DataFrames and Datasets were added. Spark enables distributed data processing through functional transformations of data collections(RDDs). The Spark API significantly reduces the size of programs compared to other frameworks like MapReduce. The three data abstractions available in Spark are:

  • Resilient Distributed Datasets

  • DataFrames

  • Datasets

Get hands-on with 1200+ tech skills courses.