Pipeline configuration

We’ll use the processors of Broadway to refactor the logic that checks each website. For this, we have to define :processors in start_link/1, and use handle_message/3:

def start_link(_args) do
  options = [
    name: ScrapingPipeline,
    producer: [
      module: {PageProducer, []},
      transformer: {ScrapingPipeline, :transform, []}
    ],
    processors: [
      default: [max_demand: 1, concurrency: 2]
    ]
  ]

  Broadway.start_link(__MODULE__, options)
end

def handle_message(_processor, message, _context) do
  if Scraper.online?(message.data) do
    # To do...
  else
    Broadway.Message.failed(message,

...

Getting Started

Easy Concurrency with the Task Module

Long-Running Processes Using GenServer

Create a GenServer and Supervisor From Scratch

Data Processing Pipelines with GenStage

Process Collections with Flow

Data Ingestion Pipelines with Broadway

Concluding the Course

Configure the Scraping Pipeline

Pipeline configuration