📋 AWS Glue Cheat Sheet

AWS Glue is the centerpiece of DEA-C01 ETL questions: crawlers, Data Catalog, ETL jobs, DataBrew, job bookmarks, and schema management.

Why This Cheat Sheet Matters for DEA-C01

This cheat sheet covers the most important AWS Glue concepts tested on the DEA-C01 (AWS Data Engineer Associate) certification exam. It contains 3 sections with 12 key points that you should memorize before exam day. AWS Glue is a fully managed, serverless ETL service. Master Glue jobs, crawlers, the Data Catalog, DataBrew, Glue Studio, job bookmarks, and Spark-based transformations for the DEA-C01 exam. Use this as a quick-reference guide during your final review sessions.

3Sections
12Key Points

Core Components

  • Crawlers automatically discover schema and populate the Glue Data Catalog.
  • The Data Catalog is a centralized metadata repository for databases, tables, and partitions.
  • Glue ETL jobs run Apache Spark (Python/Scala) or Python Shell scripts in a serverless environment.
  • DataBrew provides a visual, no-code interface for data preparation and profiling.

Job Management

  • Job bookmarks track previously processed data to enable incremental ETL.
  • Glue workflows orchestrate multiple crawlers and jobs into a single pipeline.
  • Glue Studio provides a visual drag-and-drop interface for building ETL pipelines.
  • DynamicFrames extend Spark DataFrames with schema flexibility and built-in transformations.

Exam Cues

  • Need serverless ETL with schema discovery: AWS Glue.
  • Need visual data preparation without code: Glue DataBrew.
  • Need incremental processing to avoid re-reading old data: job bookmarks.
  • Need centralized metadata for Athena, EMR, and Redshift Spectrum: Glue Data Catalog.

Practice AWS Glue Questions

Put your knowledge to the test with practice questions.

More DEA-C01 Cheat Sheets