Deploying a Machine Learning Model with Amazon SageMaker

Takes 90 mins

Amazon SageMaker is an AWS-managed service that provides machine learning services. It provides an integrated Jupyter Notebook. Data scientists can easily access data, analyze and extract its features, train a machine learning model, evaluate it, and deploy it on a hosted environment. It provides native support for making our own machine learning algorithms and some commonly used algorithms that perform well on huge datasets.

In this Cloud Lab, you’ll make a notebook instance in Amazon SageMaker, deploy a machine learning model on the notebook instance, and host the model on an endpoint. Moreover, you’ll use a Lambda function to access the endpoint. At the end of the Cloud Lab, you’ll use an API gateway to trigger the Lambda function with a payload and get predictions in response.

The following is the high-level architecture diagram of the infrastructure you’ll create in this Cloud Lab:

Why deploying an ML model is often harder than training it

Training a model is only part of the job. The real challenge shows up when you need to make predictions reliably in a real application. You need infrastructure, an inference endpoint, an interface your app can call, and a way to test and evolve the system without breaking everything.

That’s why Amazon SageMaker is a popular choice for deployment: it gives you managed building blocks for hosting models and exposing them to applications, without forcing you to hand-roll everything from scratch.

What you’re building in this Cloud Lab

This Cloud Lab focuses on a clean, practical deployment pipeline:

Build inside a managed notebook environment: You’ll create a notebook instance in Amazon SageMaker and use it as the place where your model deployment starts. This mirrors how many teams prototype and validate models before they formalize a production workflow.
Host the model behind an endpoint: Instead of running inference locally, you’ll host the model on a SageMaker endpoint. This is the key step that turns your model into a service that can handle prediction requests.
Add a Lambda layer for simple inference access: Once you have an endpoint, you’ll create a Lambda function that invokes it. This is a common architectural move because Lambda gives you a lightweight “backend wrapper” where you can:
- Validate inputs
- Normalize request payloads
- Control what you log
- Return consistent responses to callers
Trigger inference through API Gateway: Finally, you’ll connect API Gateway to your Lambda function so you can send a request (with a payload) and get predictions back over HTTP. This gives you an end-to-end workflow that looks and feels like a real ML-powered API.

Where this pattern shows up in real projects

The architecture you practice here is a foundation for:

Product features that call ML inference in real time (recommendations, scoring, categorization).
Internal services that expose predictions to multiple teams.
Prototypes that need a shareable endpoint for testing and demos.
Early-stage MLOps setups before you add monitoring, CI/CD, or automated retraining.

Once you’re comfortable with the basics, the next natural steps are adding authentication, monitoring, deployment automation, and tighter cost controls. But having a working “model served behind an API” baseline is the prerequisite for all of that.

What “done” looks like

By the end of the Cloud Lab, you should be able to explain and repeat the full flow:

A model is hosted on a SageMaker endpoint.
Lambda invokes the endpoint for inference.
API Gateway triggers Lambda with a request payload.
The system returns a prediction response that you can integrate into an app.

That’s the core deployment life cycle many teams use as a starting point. It’s simple, testable, and easy to extend.