Spark environment

A Spark environment is a cluster of machines with a single driver node and zero or more worker nodes. The driver machine is the master node in the cluster and is responsible for coordinating the workloads to perform.

Driver and worker nodes

In general, workloads will be distributed across the worker nodes when performing operations on Spark dataframes. However, when working with Python objects, such as lists or dictionaries, objects will be instantiated on the driver node.

Ideally, you want all of your workloads to be ...

Introduction to Building Scalable Model Pipelines

Models as Web Endpoints

Models as Serverless Functions

Create an Echo Function in Lambda

Working with S3 in Lambda

Working with API in Lambda

Containers for Reproducible Models

Working with AWS Container Registry

Workflow Tools for Model Pipelines

PySpark for Batch Pipelines

Cloud Dataflow for Batch Modeling

Streaming Model Workflows

Course Conclusion

- Spark Clusters

Spark environment

Driver and worker nodes