📋 S3 for Data Lakes Cheat Sheet

S3 is the backbone of AWS data lakes. DEA-C01 tests your knowledge of storage organization, formats, access, and lifecycle management.

Data Organization

  • Use partitioned prefix structures (e.g., year/month/day) to enable partition pruning in Athena and Spectrum.
  • Store data in columnar formats (Parquet, ORC) for faster queries and lower costs.
  • Use compression (Snappy, GZIP, ZSTD) to reduce storage and scan costs.
  • S3 Select and Glacier Select retrieve subsets of objects without downloading the full object.

Lifecycle and Security

  • Lifecycle policies transition objects between storage classes based on age.
  • S3 Intelligent-Tiering automates class transitions for unpredictable access patterns.
  • Bucket policies and IAM policies control access; Lake Formation adds fine-grained column/row control.
  • S3 event notifications trigger Lambda, SQS, or SNS on object creation or deletion.

Exam Cues

  • Need reduce Athena scan cost: partition data and use Parquet.
  • Need automate tiering: S3 Intelligent-Tiering or lifecycle policies.
  • Need trigger ETL on new data arrival: S3 event notification to Lambda or Glue.
  • Need fine-grained data lake access: Lake Formation over S3 bucket policies.

Practice S3 Questions

Put your knowledge to the test with practice questions.

More DEA-C01 Cheat Sheets