Create Multi-stage Data Pipelines

Learn when multistage data pipelines are needed and then implement them by adding a producer-consumer to our scraper project.

We'll cover the following

A word of caution
Use plain functions
Business logic
Add a producer-consumer

start_link
init
handle_events

Rewire our pipeline
Edit application.ex
The playground

We’ve already demonstrated how :producer and :consumer stages work in practice. The only type of stage we haven’t seen in action yet is the :producer_consumer stage. Producer-consumer stages are the key to building infinitely complex data processing pipelines. The good news is that if we understand how producers and consumers work, we already know producer-consumers.

A word of caution

When we learn to add stages and extend our data pipelines, we may be tempted to organize our business logic using stages rather than plain Elixir modules and functions. As the GenStage documentation warns us, this is an anti-pattern:

“If our domain has to process the data in multiple steps, we should write that logic in separate modules and not directly in a GenStage. We only add stages according to the runtime needs, typically when we need to provide back-pressure or leverage concurrency.”

Use plain functions

A good rule of thumb is to always start with plain functions. When we recognize the need to use back-pressure, we create a two-stage data pipeline first. As we will see in a moment, adding more stages is easy, so we can extend it gradually when we spot an opportunity to improve.

Business logic

First, we need to add some business logic that justifies adding another stage. We open scraper.ex and add the following function:

Get hands-on with 1200+ tech skills courses.

Getting Started

Easy Concurrency with the Task Module

Long-Running Processes Using GenServer

Data Processing Pipelines with GenStage

Process Collections with Flow

Data Ingestion Pipelines with Broadway

Concluding the Course

Create Multi-stage Data Pipelines

A word of caution

Use plain functions

Business logic