Spark and Big Data

Learn about the fundamentals of big data and how Apache Spark fits into processing large datasets. Discover the big data life cycle, batch processing techniques, and how Spark handles data ingestion, transformation, and distributed processing for scalable computation.

We'll cover the following...

Big data primer

Big data life cycle

Spark and the batch data processing model
Distributed processing model

Big data primer

Before we describe the processing model that Spark fits into in both the context of this course and big data, it’s important to explain what big data means.

The term big data fundamentally refers to various technologies aligned with different strategies on how to process large datasets of information.

The word “large” has traditionally and implicitly included the notion that whatever dataset is being processed, it packs an amount of information that realistically cannot be processed by a single resource, such as a lone server or computer. Because available processing power and business needs are constantly changing, the word also includes the notion that the exact size of a dataset is not estimated to a specific figure.

As vague as it might seem, “big” is an appropriate word to refer to datasets that are undefined by the limits of their size while representing vast volumes ...

1.Course Introduction

2.Spark Introduction and Basics

3.Getting Started with Spark

4.DataFrame Basic Operations

5.DataFrame Advanced Operations

6.Spark SQL and Other Functionalities

7.Building a Big Data Batch Application

8.Deployment and Cluster Execution

9.Monitoring and Performance Fundamentals

10.Conclusion

11.Apendix

Spark and Big Data

Big data primer