Add Flow to a GenStage Pipeline

Learn to add flow to the GenStage pipeline using the scraper project.

We'll cover the following

The scraper project
Flow with the GenStage
Add flow to the scraper project
Make changes to application.ex
Set the :name argument
Update the options
Caution when using into_stages/3 and through_stages/3
The playground

The `scraper` project

The scraper project has the following components in the data-processing pipeline:

It has one PageProducer process of the :producer type.
It has two OnlinePageConsumerProducer processes of the :producer_consumer type.
It has one PageConsumerSupervisor process of the :consumer type.
It has up to two PageConsumer processes started on demand by PageConsumerSupervisor.

Flow with the GenStage

To demonstrate how Flow works with GenStage, we’ll rewrite the original OnlinePageConsumerProducer implementation using Flow. When working with GenStage, two groups of functions are available to use. The first group is made to work with already running stages:

It works with the from_stages/2 stage to receive events from :producer stages.
It works with the through_stages/3 stage to send events to :producer_consumer stages and receive what they send in turn.
It works with the into_stages/3 stage to send events to :consumer or :producer_consumer stages.

All functions in this group require a list of the process ids (PIDs) to connect to the already running processes.

The second group of functions is useful when we want Flow to start the GenStage processes for us. These are the functions:

from_specs/2
through_specs/3
into_specs/3

They work the same way as those in the previous group, except they require a list of tuples instead of a list of PIDs. Each tuple represents a child specification, similar to the child specifications we use in supervisors. An example of this could be {Module, args}. The Flow module uses the child specification to start the processes for us.

We plan to use from_stages/2 and into_stages/3 to connect to the already running PageProducer. We receive new page events, filter offline websites, and send them to PageConsumerSupervisor, just like before. Let’s get started.

Add flow to the scraper project

First, let’s add flow to the scraper project:

Get hands-on with 1200+ tech skills courses.

Getting Started

Easy Concurrency with the Task Module

Long-Running Processes Using GenServer

Data Processing Pipelines with GenStage

Process Collections with Flow

Data Ingestion Pipelines with Broadway

Concluding the Course

Add Flow to a GenStage Pipeline

The `scraper` project

Flow with the GenStage

Add flow to the scraper project

Getting Started

Easy Concurrency with the Task Module

Long-Running Processes Using GenServer

Data Processing Pipelines with GenStage

Process Collections with Flow

Data Ingestion Pipelines with Broadway

Concluding the Course

Add Flow to a GenStage Pipeline

The scraper project

Flow with the GenStage

Add flow to the scraper project

The `scraper` project