...

/

Introduction

Introduction

Learn the evolution and history behind Spark, the ubiquitous and unified big data processing platform.

Getting started with Spark

Spark has become the ubiquitous platform for data processing and has taken over the traditional MapReduce framework. In fact, some technologists would go so far as to declare MapReduce dead. Spark has been proven to outperform MapReduce by several orders of magnitude in numerous benchmarks and performance studies. Below, we briefly recount the history behind Spark's dominance in the big data space.

History

The big data movement began in earnest with Google’s ambition to index the world wide web and make it searchable for users at lightning speed. The result was:

  • Google File System (GFS): A fault-tolerant distributed file system running on clusters of cheap commodity hardware.

  • Bigtable: A scalable store of structured data on top of GFS.

  • MapReduce: A new parallel programming paradigm that allows for processing large amounts of data distributed across GFS ...