- Model Pipeline

Using GCP components with PySpark to build a pipeline.

To read in the natality dataset, we can use the read function with the Avro setting to fetch the dataset. Since we are using the Avro format, the dataframe will be lazily loaded and the data is not retrieved until the display command is used to sample the dataset, as shown in the snippet below:

Get hands-on with 1200+ tech skills courses.