Operational Efficiency and Optimization for GenAI Applications I
Explore techniques to enhance operational efficiency in Generative AI applications on AWS. Learn how to optimize token consumption, enable parallel processing, implement effective caching, and reduce response latency while maintaining output quality. This lesson helps you apply practical solutions to improve cost-effectiveness and performance for real-world GenAI workloads using Amazon Bedrock and AWS services.
We'll cover the following...
Question 51
A company operates a customer-facing GenAI chatbot built on Amazon Bedrock. After reviewing monthly cost reports, the team discovers that token usage has increased significantly. Analysis shows that repeated system instructions and verbose prompts are contributing to unnecessary token consumption. The company wants to reduce overall token costs by at least 40% without changing the underlying foundation model or degrading response quality.
Which approach will most effectively reduce token usage?
A. Increase the temperature parameter to encourage shorter responses.
B. Apply prompt compression and context pruning to remove redundant instructions and unused conversation history.
C. Enable Amazon Bedrock provisioned throughput to stabilize inference costs.
D. Replace the existing ...