Deployment and Orchestration of ML Workflows
Explore practical deployment and orchestration strategies for AWS SageMaker ML workflows. Understand endpoint types for different inference needs, implement CI/CD pipelines with blue/green deployments, automate retraining with event triggers, and choose the right tools to minimize latency, cost, and operational complexity in production environments.
Question 36
A company operates a real-time fraud detection model that receives sporadic traffic with long idle periods. Sometimes, hours pass without a single request. When invoked, the model must respond with sub-second latency. The team wants to minimize costs while maintaining acceptable response times.
Which SageMaker endpoint type should the team use?
A. Deploy the model on a SageMaker real-time endpoint with a single ml.m5.large instance to guarantee consistently low latency at all times.
B. Deploy the model on a SageMaker Serverless Inference endpoint, accepting potential cold-start latency in exchange for automatic scale-to-zero during idle periods.
C. Deploy the model on a SageMaker Asynchronous Inference endpoint to queue requests and process them when capacity is available.
D. Deploy the model on a SageMaker Batch Transform job triggered by each incoming request to avoid maintaining any persistent infrastructure.
Question 37
An ML team needs to deploy a computer vision model that processes high-resolution medical images. Each image can be up to 500 MB in size, and processing takes approximately three minutes per image. Clinicians submit individual images on demand throughout the day, and results do not need to be returned synchronously. They can be retrieved from storage once processing completes.
Which SageMaker inference option should the team use?
A. Deploy the model on a SageMaker real-time endpoint and increase the endpoint timeout to accommodate the three-minute processing time.
B. Deploy the model using SageMaker Batch Transform, triggered each time a new image is uploaded to Amazon S3.
C. Deploy the model on a SageMaker Asynchronous Inference endpoint, which supports payloads up to 1 GB and processing times up to 15 minutes.
D. Deploy the model on a SageMaker Serverless Inference endpoint to minimize costs during periods of low image submission volume.
Question 38
A retail company wants to deploy 50 different product recommendation models, one for each product category. Each model is relatively small (under 200 MB) and uses the same ML framework. Individual product categories receive low traffic, but the company wants all models available for inference at any time. The team needs to minimize endpoint infrastructure costs.
Which deployment approach should ...