- Natality Streaming
Explore how to build streaming model pipelines using Google Cloud PubSub and Dataflow. Understand how to consume streaming data, apply predictive models, and store results in Datastore. Gain hands-on experience creating scalable real-time machine learning workflows with the Natality dataset.
We'll cover the following...
PubSub can be used to provide data sources and data sinks within a Dataflow pipeline, where a consumer is a data source and a publisher is a data sink.
Example
We’ll reuse the Natality dataset to create a pipeline with Dataflow, but for the streaming version, we’ll use a PubSub consumer as the input data source rather than a BigQuery result set.
Defining functions
For the output, we’ll publish predictions to Datastore and reuse the published DoFn from the previous chapter.
The code snippet above shows the function we’ll use to perform the model application in the streaming pipeline. This function is the same as the function we defined in Cloud Dataflow and Batch Modeling with one modification: the json.loads function is used to convert the passed in string into a dictionary object. ...