What Does This Course Cover?
Get an overview of the topics covered by this course.
Overview
Data processing is an essential part of many software applications. Most engineers don’t even think about it as something separate from programming. However, if we’re transforming information in some way, such as reporting, data aggregation, or analytics, then we’re doing data processing.
Thanks to the Erlang Virtual Machine (also known as the BEAM), everyone who uses Elixir benefits from its fantastic concurrency model, and is particularly well-suited for long-running, concurrent tasks. As a result, we find that Elixir offers more ways for performing concurrent work than other languages.
This course is here to help you navigate the world of concurrency tools available in the Elixir ecosystem. We’ll learn about the most popular modules and libraries and start using them in no time. We’ll also discover a range of new techniques that help us simplify our product, improve the performance of our code, and make our application more resilient to errors and increased workloads.
Structure
Get started with concurrency
Let’s get started on the journey of learning how concurrency works in Elixir. We'll be introduced to the following concepts:
- The
Task
module
- Processes
- Timeouts
Along with some other topics that lay the foundation for the following chapters to build upon.
GenServers and supervisors
We’ll learn about GenServer
and supervisors:
- We’ll learn to create and configure
GenServer
processes by building a simple job processing system.
- We’ll introduce the
Supervisor
behaviour and talk about how Elixir achieves fault-tolerance.
Data processing pipelines
We now move on to data processing pipelines:
- We will learn about back-pressure and the building blocks of
GenStage
—producer, consumer, and producer-consumer.
- We will also start building our very own web scraper by putting what we have learned into practice.
Use the Flow
module
There are different ways we can use Flow
:
- We can use
Flow
instead ofGenStage
for operations likemap
,filter
,reduce
, and more.
- We’ll learn how to use
Flow
when working with large datasets and even plug it into existingGenStage
data processing pipelines.
Set up a data ingestion pipeline
We’ll set up a data-ingestion pipeline using RabbitMQ, but the techniques also apply to other message brokers, such as Amazon SQS, Apache Kafka, and Google Cloud Pub/Sub. We’ll cover the various options and benefits that come with Broadway.
About the code
We can’t apply all the techniques in this course without having data to process or services to integrate with. At the same time, downloading large data sets or signing up to third-party services is too cumbersome and not practical for this course. That’s why all projects attempt to simulate real-world cases to focus on the implementation details. It also makes them easy to reproduce.