Batch Model Pipeline

Introduction to batch model pipelines.

Cloud Dataflow provides a useful framework for scaling up sklearn models to massive datasets. Instead of fitting all the input data into a data frame, we can score each record individually in the process function and use Apache Beam to stream these outputs to a data sink, such as BigQuery. As long as we have a way of distributing our model across the worker nodes, we can use Dataflow to perform the distributed model application.

Get hands-on with 1200+ tech skills courses.