Exam Data-Engineer-Associate Topic 2 Question 249 Discussion

Actual exam question for Amazon's Data-Engineer-Associate exam
Question #: 249
Topic #: 2

An ecommerce company collects daily customer transaction logs in CSV format and stores the logs in Amazon S3. The company uses Amazon Athena to scan a subset of attributes from the logs on the same day the company receives each log.
Query times are increasing because of increasing transaction volume. The company wants to improve query performance.
Which solution will meet these requirements with the SHORTEST query times?

A. Convert the CSV logs into multiple ORC files for better parallelism in Athena. Partition by date in Amazon S3. Use columnar pushdown filters. B. Convert the CSV logs to JSON. Partition by date in Amazon S3. Use Athena with dynamic filtering to reduce data scans. C. Convert the CSV logs to Avro. Partition by date in Amazon S3. Use Athena with projection-based partitioning. D. Convert the CSV logs to a single Apache Parquet file for each day. Partition the data by date in Amazon S3. Use Athena with predicate pushdown filters.

Suggested Answer: D Vote an answer

Amazon Athena achieves the fastest query performance when data is stored in columnar formats such as Apache Parquet and when queries can take advantage of partition pruning and predicate pushdown.
Converting CSV files to Parquet significantly reduces the amount of data scanned because Parquet stores data in a column-oriented layout. Since Athena queries only a subset of attributes, it reads only the required columns instead of scanning entire rows, which dramatically improves performance. Predicate pushdown further reduces query time by filtering data at the storage layer.
Partitioning the data by date ensures that Athena scans only the relevant partitions for same-day queries, minimizing unnecessary data reads. Storing one Parquet file per day is efficient and avoids the overhead of managing excessive small files.
ORC is also a columnar format, but Parquet is more commonly optimized and recommended for Athena workloads in AWS exam guidance. JSON and Avro are row-based or semi-row-based formats and result in larger scan sizes and slower query execution.
Therefore, Option D provides the shortest query times and aligns with Athena performance best practices.

by dhruvbakshi316 at Apr 11, 2026, 03:18 PM

Limited Time Offer

15%

Off

Get Premium Data-Engineer-Associate Questions as Interactive Self Test Engine or PDF

Comments

dhruvbakshi316

2026-04-11 15:18:22

Selected Answer: A

ORC is a columnar format — since the company scans only a subset of attributes, Athena reads only the needed columns, dramatically reducing data scanned

Multiple files enable Athena to parallelize reads across multiple workers simultaneously — this is critical for shortest query times

Partition by date allows Athena to skip irrelevant partitions (queries run on same-day data)

Columnar pushdown filters push filtering down to the storage layer, minimizing data read from S3

The combination of parallelism + columnar format + partitioning + pushdown delivers the fastest query performance

upvoted 1 times

...

0 Satisfied Customers

0 Shares

0 Demo Downloads

10 Years in Business