Exam Data-Engineer-Associate Topic 2 Question 249 Discussion

Actual exam question for Amazon's Data-Engineer-Associate exam
Question #: 249
Topic #: 2
An ecommerce company collects daily customer transaction logs in CSV format and stores the logs in Amazon S3. The company uses Amazon Athena to scan a subset of attributes from the logs on the same day the company receives each log.
Query times are increasing because of increasing transaction volume. The company wants to improve query performance.
Which solution will meet these requirements with the SHORTEST query times?

Suggested Answer: D Vote an answer

Amazon Athena achieves the fastest query performance when data is stored in columnar formats such as Apache Parquet and when queries can take advantage of partition pruning and predicate pushdown.
Converting CSV files to Parquet significantly reduces the amount of data scanned because Parquet stores data in a column-oriented layout. Since Athena queries only a subset of attributes, it reads only the required columns instead of scanning entire rows, which dramatically improves performance. Predicate pushdown further reduces query time by filtering data at the storage layer.
Partitioning the data by date ensures that Athena scans only the relevant partitions for same-day queries, minimizing unnecessary data reads. Storing one Parquet file per day is efficient and avoids the overhead of managing excessive small files.
ORC is also a columnar format, but Parquet is more commonly optimized and recommended for Athena workloads in AWS exam guidance. JSON and Avro are row-based or semi-row-based formats and result in larger scan sizes and slower query execution.
Therefore, Option D provides the shortest query times and aligns with Athena performance best practices.

by dhruvbakshi316 at Apr 11, 2026, 03:18 PM

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
Nick name: Submit Cancel
dhruvbakshi316
2026-04-11 15:18:22
Selected Answer: A
ORC is a columnar format — since the company scans only a subset of attributes, Athena reads only the needed columns, dramatically reducing data scanned

Multiple files enable Athena to parallelize reads across multiple workers simultaneously — this is critical for shortest query times

Partition by date allows Athena to skip irrelevant partitions (queries run on same-day data)

Columnar pushdown filters push filtering down to the storage layer, minimizing data read from S3

The combination of parallelism + columnar format + partitioning + pushdown delivers the fastest query performance
upvoted 1 times
...
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

0
0
0
10