Cloud Environments
Understand how to leverage cloud environments such as AWS and Google Cloud Platform to build scalable data science pipelines. Learn about virtual machines, managed tools, and databases including EC2, Lambda, Redshift, BigQuery, and Dataflow to create efficient batch and streaming model workflows in a cloud setting.
Scalable data science pipeline
To build scalable data science pipelines, it’s necessary to move beyond single machine scripts and move to clusters of machines. While this is possible to do with an on-premise setup, a common trend is using cloud computing environments to achieve large-scale processing. There are a number of different options available, with the top three platforms currently being Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure.
Most cloud platforms offer free credits for getting started. GCP offers a credit for new users to get hands-on with their tools, while AWS provides free-tier access for many services.
In this course, we’ll get hands-on with both AWS and GCP with little to no cost involved.