Build the Data Processing Pipeline
Explore how to build a data processing pipeline with Elixir's GenStage by creating producer and consumer stages. Understand GenStage callbacks like handle_demand and handle_events to manage event flow, and learn how to set up dynamic subscriptions. This lesson helps you build an efficient pipeline for concurrent data scraping and processing.
Complex use cases may require a data processing pipeline with a consumer stage, one or more producers, and several producer-consumers in between. However, the main principles stay the same. Therefore, we’ll start with a two-stage pipeline first and demonstrate how that works.
We will build a fake service that scrapes data from web pages—normally an intensive task dependent on system resources and a reliable network connection. Our goal is to request a number of URLs to be scraped and have the data pipeline take care of the workload.
Create our mix project
First, we’ll create a new application with a supervision tree, as we’ve done before. We will name it scraper and pretend we’re going to scrape data from web pages. We can see this below:
mix new scraper --sup
We have already created an application at the backend for you, so there’s no need to run the command above. This will create a project, scraper. We have added gen_stage as a dependency to mix.exs:
Then, we run the mix do deps.get command to ...