Data Organization
- Use partitioned prefix structures (e.g., year/month/day) to enable partition pruning in Athena and Spectrum.
- Store data in columnar formats (Parquet, ORC) for faster queries and lower costs.
- Use compression (Snappy, GZIP, ZSTD) to reduce storage and scan costs.
- S3 Select and Glacier Select retrieve subsets of objects without downloading the full object.
Lifecycle and Security
- Lifecycle policies transition objects between storage classes based on age.
- S3 Intelligent-Tiering automates class transitions for unpredictable access patterns.
- Bucket policies and IAM policies control access; Lake Formation adds fine-grained column/row control.
- S3 event notifications trigger Lambda, SQS, or SNS on object creation or deletion.
Exam Cues
- Need reduce Athena scan cost: partition data and use Parquet.
- Need automate tiering: S3 Intelligent-Tiering or lifecycle policies.
- Need trigger ETL on new data arrival: S3 event notification to Lambda or Glue.
- Need fine-grained data lake access: Lake Formation over S3 bucket policies.
Practice S3 Questions
Put your knowledge to the test with practice questions.