Resilient Distributed Datasets of Spark

Learn about the basic building block (RDDs) of Spark.

RDDs provide a restricted form or an abstraction of shared memory based on coarse-grained transformationsA transformation applied with the help of a function like Map and Reduce on a bulk of data. rather than fine-grained transformationsA transformation applied to an entity of a database.. Simply put, RDDs are distributed data on a collection of worker nodes' memories based on coarse-grained transformations in a cluster.

Creation of RDDs

RDDs are an object in the language they are being made. We can build an RDD in the following ways.

From a file

An RDD can be built from a file in a distributed file system (DFS). It would create an RDD in which each block of data in DFS will be a partition in the RDD, and each record in a partition would represent a line in that file.

Level up your interview prep. Join Educative to access 80+ hands-on prep courses.