Overview

Data processing is an essential part of many software applications. Most engineers don’t even think about it as something separate from programming. However, if we’re transforming information in some way, such as reporting, data aggregation, or analytics, then we’re doing data processing.

Thanks to the Erlang Virtual Machine (also known as the BEAM), everyone who uses Elixir benefits from its fantastic concurrency model, and is particularly well-suited for long-running, concurrent tasks. As a result, we find that Elixir offers more ways for performing concurrent work than other languages.

This course is here to help you navigate the world of concurrency tools available in the Elixir ecosystem. We’ll learn about the most popular modules and libraries and start using them in no time. We’ll also discover a range of new techniques that help us simplify our product, improve the performance of our code, and make our application more resilient to errors and increased workloads.

Structure

Get started with concurrency

Let’s get started on the journey of learning how concurrency works in Elixir. We'll be introduced to the following concepts:

  • The Task module

  • Processes

  • Timeouts

Along with some other topics that lay the foundation for the following chapters to build upon.

GenServers and supervisors

We’ll learn about GenServer and supervisors:

  • We’ll learn to create and configure GenServer processes by building a simple job processing system.

  • We’ll introduce the Supervisor behaviour and talk about how Elixir achieves fault-tolerance.

Data processing pipelines

We now move on to data processing pipelines:

  • We will learn about back-pressure and the building blocks of GenStage—producer, consumer, and producer-consumer.

  • We will also start building our very own web scraper by putting what we have learned into practice.

Use the Flow module

There are different ways we can use Flow:

  • We can use Flow instead of GenStage for operations like map, filter, reduce, and more.

  • We’ll learn how to use Flow when working with large datasets and even plug it into existing GenStage data processing pipelines.

Set up a data ingestion pipeline

We’ll set up a data-ingestion pipeline using RabbitMQ, but the techniques also apply to other message brokers, such as Amazon SQS, Apache Kafka, and Google Cloud Pub/Sub. We’ll cover the various options and benefits that come with Broadway.

About the code

We can’t apply all the techniques in this course without having data to process or services to integrate with. At the same time, downloading large data sets or signing up to third-party services is too cumbersome and not practical for this course. That’s why all projects attempt to simulate real-world cases to focus on the implementation details. It also makes them easy to reproduce.