Spark API

Get an introduction to the three Spark data abstractions - RDDs, DataFrames, and Datasets.

We'll cover the following

APIs

Spark offers APIs and data abstractions that significantly enhance the developer experience. The original Spark paper describes the low-level abstraction called Resilient Distributed Datasets (RDD). Later, others such as DataFrames and Datasets were added. Spark enables distributed data processing through functional transformations of collections of data (RDDs). The Spark API significantly reduces the size of programs compared to the size of the same programs when written in other frameworks such MapReduce. The Spark APIs are:

  • Resilient Distributed Datasets

  • DataFrames

  • Datasets

Get hands-on with 1200+ tech skills courses.