Introduction to Distributed Systems for Dummies/

...

Big Data and Apache Spark

Learn more about big data and big data processing.

We'll cover the following...

What is big data?
- Classification
Apache Spark
Key takeaways

In this chapter we will feature a widely popular big data processing framework called Apache Spark. And in the next chapter, we will discuss a distributed database system.

What is big data?

Answering this question is a bit tricky, given that the definition depends a lot on the context. But let’s first start somewhere.

Big data is a large amount of data that cannot be stored or processed using traditional methods.

In traditional data-processing methods, data is processed on a machine using simple techniques. On the machine, there is some amount of data stored on the disk. A program is run to read the data, extract what is required from the data, and then process it. Suppose the data is small and can be easily processed on an average machine using obvious techniques. In that case, we do not require any fancy specialized ...

Introduction

What Distributed Systems Achieve for Us

Data in Distributed Systems

Communication Between Nodes

Data Processing in Large Scale

Distributed System Architectural Patterns

Case Study 1: Apache Spark

Case Study 2: Apache Druid

Conclusion

Big Data and Apache Spark

What is big data?