Query Optimization
- Partition data in S3 to enable partition pruning and reduce scanned data.
- Use columnar formats (Parquet, ORC) to scan only needed columns.
- Compress data (Snappy, ZSTD, GZIP) to reduce scan volume and cost.
- CTAS (CREATE TABLE AS SELECT) creates optimized tables from query results.
Management
- Workgroups separate users, enforce cost limits, and track query usage.
- Federated queries use data source connectors to query RDS, DynamoDB, and other sources.
- Athena uses the Glue Data Catalog as its default metastore.
- Pricing is based on data scanned — optimization reduces both cost and latency.
Exam Cues
- Need serverless SQL on S3: Athena.
- Need reduce Athena costs: partition, use Parquet, and compress.
- Need query external databases from Athena: federated queries.
- Need limit per-team query spend: Athena workgroups with data scan limits.
Practice Athena Questions
Put your knowledge to the test with practice questions.