...
/Introduction to Cloud Dataflow and Batch Modeling
Introduction to Cloud Dataflow and Batch Modeling
Introduction to streaming model workflows.
We'll cover the following...
We'll cover the following...
What is Dataflow?
Dataflow is a tool for building data pipelines that can run locally or scale up to large clusters in a managed environment. While Cloud Dataflow was initially incubated at Google as a GCP specific tool, it now builds upon the open-source Apache Beam library, making it usable in other cloud environments.
The tool provides:
- input connectors to different data sources, such as BigQuery and files on Cloud Storage
- operators for transforming and aggregating data
- output connectors to systems such as Cloud Datastore and BigQuery
In this chapter, we’ll build a pipeline with Dataflow that reads in data from BigQuery, applies a sklearn model to create predictions, and writes the ...