Exam AIP-C01 Topic 3 Question 102 Discussion

Actual exam question for Amazon's AIP-C01 exam
Question #: 102
Topic #: 3

A healthcare company is using Amazon Bedrock to build a Retrieval Augmented Generation (RAG) application that helps practitioners make clinical decisions. The application must achieve high accuracy for patient information retrievals, identify hallucinations in generated content, and reduce human review costs.
Which solution will meet these requirements?

A. Use Amazon Comprehend to analyze and classify RAG responses and to extract medical entities and relationships. Use AWS Step Functions to orchestrate automated evaluations. Configure Amazon CloudWatch metrics to track entity recognition confidence scores. Configure CloudWatch to send an alert when accuracy falls below specified thresholds. B. Implement automated large language model (LLM)-based evaluations that use a specialized model that is fine-tuned for medical content to assess all responses. Deploy AWS Lambda functions to parallelize evaluations. Publish results to Amazon CloudWatch metrics that track relevance and factual accuracy. C. Configure Amazon CloudWatch Synthetics to generate test queries that have known answers on a regular schedule, and track model success rates. Set up dashboards that compare synthetic test results against expected outcomes. D. Deploy a hybrid evaluation system that uses an automated LLM-as-a-judge evaluation to initially screen responses and targeted human reviews for edge cases. Use a built-in Amazon Bedrock evaluation to track retrieval precision and hallucination rates.

Suggested Answer: D Vote an answer

Option D is the correct solution because it directly addresses all three requirements: high retrieval accuracy, hallucination detection, and reduced human review costs. AWS recommends a layered evaluation strategy for high-stakes domains such as healthcare, where generative outputs must be both accurate and safe.
Using an automated LLM-as-a-judge evaluation enables scalable, consistent assessment of generated responses for factual grounding, relevance, and hallucination risk. This automated screening significantly reduces the number of responses that require manual inspection. Only responses that fall below defined quality thresholds or exhibit ambiguous behavior are escalated to targeted human reviews, which optimizes review effort and cost.
The use of Amazon Bedrock built-in evaluations provides standardized metrics specifically designed for RAG systems, including retrieval precision, faithfulness to source documents, and hallucination rates. These evaluations integrate directly with Amazon Bedrock knowledge bases and models, eliminating the need to build and maintain custom evaluation pipelines.
Option A focuses on entity extraction confidence, which does not reliably detect hallucinations in generative text. Option B requires maintaining and scaling a separate fine-tuned evaluation model, increasing complexity and cost. Option C is useful for regression testing but cannot detect hallucinations in real-world, open-ended clinical queries.
Therefore, Option D provides the most effective and operationally efficient approach to maintaining clinical- grade accuracy while minimizing human review effort.

by Kelly at Feb 12, 2026, 10:06 AM

Limited Time Offer

15%

Off

Get Premium AIP-C01 Questions as Interactive Self Test Engine or PDF

Comments

0 Satisfied Customers

0 Shares

0 Demo Downloads

10 Years in Business