Data Store Management II
Explore techniques to optimize AWS data stores including cost-saving strategies for Amazon Redshift Spectrum queries, efficiently synchronizing AWS Glue Data Catalog partitions, exporting large datasets in Apache Parquet format from Redshift, and implementing automated data expiration in DynamoDB. Understand how to protect S3 objects with versioning and object locks, and design Redshift tables for improved query performance. This lesson equips you to manage data store operations and security to enhance performance and compliance in AWS environments.
We'll cover the following...
Question 28
A logistics company has a large dataset in Amazon S3 that is queried frequently by Amazon Redshift using Redshift Spectrum. The data engineering team has observed that the same subset of S3 data is scanned repeatedly across multiple queries, resulting in high Spectrum costs. The team wants to reduce costs and improve query performance for these repeated queries without fully loading the entire dataset into Redshift.
Which solution should the data engineer implement?
A. Create Amazon Redshift materialized views over the Spectrum external tables to cache frequently accessed query results within Redshift.
B. Increase the number of Redshift compute nodes to process Spectrum queries faster.
C. Convert the S3 data from CSV to Apache Parquet format to reduce the amount of data scanned.
D. Enable Amazon Redshift concurrency scaling to handle the repeated query load.
Question 29
A company partitions its S3 data lake by date using a year/month/day prefix structure. An ETL pipeline adds new partitions daily, but analysts report that Amazon Athena queries do not return ...