Implementation and Integration II
Understand and evaluate deployment options for generative AI applications on AWS, focusing on cost efficiency, scaling, and model routing. Learn to implement reliable, scalable architectures using Amazon Bedrock, SageMaker endpoints, serverless inference, and custom container deployments to meet various business and technical requirements for generative AI workloads.
We'll cover the following...
Question 26
A company is building a customer support chatbot using Amazon Bedrock. Traffic is highly variable, ranging from zero users overnight to sudden spikes of up to 1,000 concurrent users during promotions. The company wants to minimize cost when idle while maintaining compatibility with the Amazon Bedrock API.
Which deployment approach is most appropriate?
A. Amazon Bedrock provisioned throughput
B. AWS Lambda invoking Amazon Bedrock on demand
C. SageMaker AI real-time endpoint with auto scaling
D. SageMaker AI serverless inference endpoint
Question 27
A company has fine-tuned a large language model for legal document analysis. The application is used intermittently by internal analysts throughout the day, with periods of low usage and occasional bursts of activity. The company wants to minimize cost when the system is idle while still supporting near–real-time inference when requests arrive. Occasional cold starts are acceptable, but the solution must automatically scale without manual capacity management.
Which deployment option best ...