Testing, Validation, and Troubleshooting I
Explore effective methods to test, validate, and troubleshoot generative AI applications built on AWS. Learn to apply Bedrock Model Evaluations, A/B testing, and retrieval effectiveness measurements to ensure accuracy, relevance, and performance in real-world GenAI deployments.
We'll cover the following...
Question 59
A company is rolling out a GenAI-powered FAQ assistant built on Amazon Bedrock. The team wants an automated way to assess whether model responses remain relevant, factually accurate, and fluent after prompt changes. The evaluation must not require custom model training and should scale to thousands of test prompts.
Which approach is most appropriate to implement this evaluation framework?
A. Store responses in Amazon S3 and calculate ROUGE and BLEU scores using a custom Lambda function.
B. Enable Amazon CloudWatch Logs and manually review sampled responses for ...