Exam Data-Engineer-Associate Topic 2 Question 249 Discussion
Actual exam question for Amazon's Data-Engineer-Associate exam
Question #: 249
Topic #: 2
Question #: 249
Topic #: 2
An ecommerce company collects daily customer transaction logs in CSV format and stores the logs in Amazon S3. The company uses Amazon Athena to scan a subset of attributes from the logs on the same day the company receives each log.
Query times are increasing because of increasing transaction volume. The company wants to improve query performance.
Which solution will meet these requirements with the SHORTEST query times?
Query times are increasing because of increasing transaction volume. The company wants to improve query performance.
Which solution will meet these requirements with the SHORTEST query times?
Suggested Answer: D Vote an answer
Amazon Athena achieves the fastest query performance when data is stored in columnar formats such as Apache Parquet and when queries can take advantage of partition pruning and predicate pushdown.
Converting CSV files to Parquet significantly reduces the amount of data scanned because Parquet stores data in a column-oriented layout. Since Athena queries only a subset of attributes, it reads only the required columns instead of scanning entire rows, which dramatically improves performance. Predicate pushdown further reduces query time by filtering data at the storage layer.
Partitioning the data by date ensures that Athena scans only the relevant partitions for same-day queries, minimizing unnecessary data reads. Storing one Parquet file per day is efficient and avoids the overhead of managing excessive small files.
ORC is also a columnar format, but Parquet is more commonly optimized and recommended for Athena workloads in AWS exam guidance. JSON and Avro are row-based or semi-row-based formats and result in larger scan sizes and slower query execution.
Therefore, Option D provides the shortest query times and aligns with Athena performance best practices.
Converting CSV files to Parquet significantly reduces the amount of data scanned because Parquet stores data in a column-oriented layout. Since Athena queries only a subset of attributes, it reads only the required columns instead of scanning entire rows, which dramatically improves performance. Predicate pushdown further reduces query time by filtering data at the storage layer.
Partitioning the data by date ensures that Athena scans only the relevant partitions for same-day queries, minimizing unnecessary data reads. Storing one Parquet file per day is efficient and avoids the overhead of managing excessive small files.
ORC is also a columnar format, but Parquet is more commonly optimized and recommended for Athena workloads in AWS exam guidance. JSON and Avro are row-based or semi-row-based formats and result in larger scan sizes and slower query execution.
Therefore, Option D provides the shortest query times and aligns with Athena performance best practices.
by dhruvbakshi316 at Apr 11, 2026, 03:18 PM
0
0
0
10
Comments
dhruvbakshi316
2026-04-11 15:18:22Multiple files enable Athena to parallelize reads across multiple workers simultaneously — this is critical for shortest query times
Partition by date allows Athena to skip irrelevant partitions (queries run on same-day data)
Columnar pushdown filters push filtering down to the storage layer, minimizing data read from S3
The combination of parallelism + columnar format + partitioning + pushdown delivers the fastest query performance
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
Report Comment
Commenting
You can sign-up / login (it's free).