Search⌘ K
AI Features

Testing, Validation, and Troubleshooting I

Understand how to evaluate the accuracy, relevance, and reliability of generative AI models deployed on AWS. Learn evaluation techniques including automated testing with Bedrock Model Evaluations, troubleshooting common issues in RAG systems, and optimizing workflows for production readiness.

Question 59

A company is rolling out a GenAI-powered FAQ assistant built on Amazon Bedrock. The team wants an automated way to assess whether model responses remain relevant, factually accurate, and fluent after prompt changes. The evaluation must not require custom model training and should scale to thousands of test prompts.

Which approach is most appropriate to implement this evaluation framework?

A. Store responses in Amazon S3 and calculate ROUGE and BLEU scores using a custom Lambda function.

B. Enable Amazon CloudWatch Logs and manually review sampled responses for ...