Search⌘ K
AI Features

Implementation and Integration II

Explore deployment and integration techniques for generative AI applications on AWS, including cost optimization during variable traffic, dynamic model routing, scaling inference endpoints, and handling embedding model performance in production environments. Learn to choose appropriate AWS services and architectures for reliable, scalable AI deployment.

Question 26

A company is building a customer support chatbot using Amazon Bedrock. Traffic is highly variable, ranging from zero users overnight to sudden spikes of up to 1,000 concurrent users during promotions. The company wants to minimize cost when idle while maintaining compatibility with the Amazon Bedrock API.

Which deployment approach is most appropriate?

A. Amazon Bedrock provisioned throughput

B. AWS Lambda invoking Amazon Bedrock on demand

C. SageMaker AI real-time endpoint with auto scaling

D. SageMaker AI serverless inference endpoint

Question 27

A company has fine-tuned a large language model for legal document analysis. The application is used intermittently by internal analysts throughout the day, with periods of low usage and occasional bursts of activity. The company wants to minimize cost when the system is idle while still supporting near–real-time inference when requests arrive. Occasional cold starts are acceptable, but the solution must automatically scale without manual capacity management.

Which deployment option best ...