Exam AIP-C01 Topic 3 Question 102 Discussion
Actual exam question for Amazon's AIP-C01 exam
Question #: 102
Topic #: 3
Question #: 102
Topic #: 3
A healthcare company is using Amazon Bedrock to build a Retrieval Augmented Generation (RAG) application that helps practitioners make clinical decisions. The application must achieve high accuracy for patient information retrievals, identify hallucinations in generated content, and reduce human review costs.
Which solution will meet these requirements?
Which solution will meet these requirements?
Suggested Answer: D Vote an answer
Option D is the correct solution because it directly addresses all three requirements: high retrieval accuracy, hallucination detection, and reduced human review costs. AWS recommends a layered evaluation strategy for high-stakes domains such as healthcare, where generative outputs must be both accurate and safe.
Using an automated LLM-as-a-judge evaluation enables scalable, consistent assessment of generated responses for factual grounding, relevance, and hallucination risk. This automated screening significantly reduces the number of responses that require manual inspection. Only responses that fall below defined quality thresholds or exhibit ambiguous behavior are escalated to targeted human reviews, which optimizes review effort and cost.
The use of Amazon Bedrock built-in evaluations provides standardized metrics specifically designed for RAG systems, including retrieval precision, faithfulness to source documents, and hallucination rates. These evaluations integrate directly with Amazon Bedrock knowledge bases and models, eliminating the need to build and maintain custom evaluation pipelines.
Option A focuses on entity extraction confidence, which does not reliably detect hallucinations in generative text. Option B requires maintaining and scaling a separate fine-tuned evaluation model, increasing complexity and cost. Option C is useful for regression testing but cannot detect hallucinations in real-world, open-ended clinical queries.
Therefore, Option D provides the most effective and operationally efficient approach to maintaining clinical- grade accuracy while minimizing human review effort.
Using an automated LLM-as-a-judge evaluation enables scalable, consistent assessment of generated responses for factual grounding, relevance, and hallucination risk. This automated screening significantly reduces the number of responses that require manual inspection. Only responses that fall below defined quality thresholds or exhibit ambiguous behavior are escalated to targeted human reviews, which optimizes review effort and cost.
The use of Amazon Bedrock built-in evaluations provides standardized metrics specifically designed for RAG systems, including retrieval precision, faithfulness to source documents, and hallucination rates. These evaluations integrate directly with Amazon Bedrock knowledge bases and models, eliminating the need to build and maintain custom evaluation pipelines.
Option A focuses on entity extraction confidence, which does not reliably detect hallucinations in generative text. Option B requires maintaining and scaling a separate fine-tuned evaluation model, increasing complexity and cost. Option C is useful for regression testing but cannot detect hallucinations in real-world, open-ended clinical queries.
Therefore, Option D provides the most effective and operationally efficient approach to maintaining clinical- grade accuracy while minimizing human review effort.
by Kelly at Feb 12, 2026, 10:06 AM
0
0
0
10
Comments
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
Report Comment
Commenting
You can sign-up / login (it's free).