- Natality Streaming
Creating pipeline with Dataflow using the Natality dataset.
We'll cover the following...
PubSub can be used to provide data sources and data sinks within a Dataflow pipeline, where a consumer is a data source and a publisher is a data sink.
Example
We’ll reuse the Natality dataset to create a pipeline with Dataflow, but for the streaming version, we’ll use a PubSub consumer as the input data source rather than a BigQuery result set.
Defining functions
For the output, we’ll publish predictions to Datastore and reuse the published DoFn from the previous chapter.
The code snippet above shows the function we’ll use to perform the model application in the streaming pipeline. This function is the same as the function we defined in Cloud Dataflow and Batch Modeling with one modification: the json.loads function is used to convert the passed in string into a dictionary object. ...