Productizing PySpark
Explore how to productize PySpark pipelines by scheduling batch jobs with workflow, cloud, and vendor tools. Understand the differences between ephemeral and persistent clusters and how to implement quality checks and monitoring to ensure reliable, scalable model pipelines in production environments.
We'll cover the following...
We'll cover the following...
Scheduling
Once you’ve tested a batch model pipeline in a notebook environment, there are a few different ways of scheduling the pipeline to run on a regular schedule.
For example, you may want a churn prediction model for a mobile game to run every morning and publish the scores ...
...
...