Search⌘ K

Conclusion : Working tools for model pipelines

Explore the use of PySpark to build scalable batch model pipelines in cloud environments like AWS and GCP. Understand how to set up environments, handle large datasets with data lakes, and the limitations of batch pipelines with prediction latency. This lesson prepares you for streaming pipelines to reduce latency in model predictions.

We'll cover the following...

PySpark is a powerful tool for data scientists to build scalable analyses and model pipelines. It is a highly desirable skill set for companies because it enables data science teams to own more of the building process and data products. There’s a variety of ways to set up an environment for PySpark, and ...