Complex use cases may require a data processing pipeline with a consumer stage, one or more producers, and several producer-consumers in between. However, the main principles stay the same. Therefore, we’ll start with a two-stage pipeline first and demonstrate how that works.

We will build a fake service that scrapes data from web pages—normally an intensive task dependent on system resources and a reliable network connection. Our goal is to request a number of URLs to be scraped and have the data pipeline take care of the workload.

Create our mix project

First, we’ll create a new application with a supervision tree, as we’ve done before. We will name it scraper and pretend we’re going to scrape data from web pages. We can see this below:

mix new scraper --sup

We have already created an application at the backend for you, so there’s no need to run the command above. This will create a project, scraper. We have added gen_stage as a dependency to mix.exs:

Get hands-on with 1400+ tech skills courses.