Search⌘ K
AI Features

Build the Data Processing Pipeline

Explore how to build a data processing pipeline with Elixir's GenStage by creating producer and consumer stages. Understand GenStage callbacks like handle_demand and handle_events to manage event flow, and learn how to set up dynamic subscriptions. This lesson helps you build an efficient pipeline for concurrent data scraping and processing.

Complex use cases may require a data processing pipeline with a consumer stage, one or more producers, and several producer-consumers in between. However, the main principles stay the same. Therefore, we’ll start with a two-stage pipeline first and demonstrate how that works.

We will build a fake service that scrapes data from web pages—normally an intensive task dependent on system resources and a reliable network connection. Our goal is to request a number of URLs to be scraped and have the data pipeline take care of the workload.

Create our mix project

First, we’ll create a new application with a supervision tree, as we’ve done before. We will name it scraper and pretend we’re going to scrape data from web pages. We can see this below:

mix new scraper --sup

We have already created an application at the backend for you, so there’s no need to run the command above. This will create a project, scraper. We have added gen_stage as a dependency to mix.exs:

C++
#file path -> scraper/mix.exs
#add this code at the indicated place mentioned in comments of scraper/mix.exs
#in the playground widget
defp deps do
[
{:gen_stage, "~> 1.0"}
]
end

Then, we run the mix do deps.get command to ...