DEA-C01 AWS Glue Cheat Sheet - AWS Data Engineer Associate Cheat Sheet | Smash The Exam

Core Components

Crawlers automatically discover schema and populate the Glue Data Catalog.
The Data Catalog is a centralized metadata repository for databases, tables, and partitions.
Glue ETL jobs run Apache Spark (Python/Scala) or Python Shell scripts in a serverless environment.
DataBrew provides a visual, no-code interface for data preparation and profiling.

Job Management

Job bookmarks track previously processed data to enable incremental ETL.
Glue workflows orchestrate multiple crawlers and jobs into a single pipeline.
Glue Studio provides a visual drag-and-drop interface for building ETL pipelines.
DynamicFrames extend Spark DataFrames with schema flexibility and built-in transformations.

Exam Cues

Need serverless ETL with schema discovery: AWS Glue.
Need visual data preparation without code: Glue DataBrew.
Need incremental processing to avoid re-reading old data: job bookmarks.
Need centralized metadata for Athena, EMR, and Redshift Spectrum: Glue Data Catalog.

Practice AWS Glue Questions

Put your knowledge to the test with practice questions.

AWS Glue Questions →Full DEA-C01 Quiz

More DEA-C01 Cheat Sheets

📋Kinesis Cheat Sheet→📋S3 for Data Lakes Cheat Sheet→📋Redshift Cheat Sheet→📋Athena Cheat Sheet→📋Lake Formation Cheat Sheet→

← DEA-C01 Study Hub Free Mock Exam 7-Day Study Plan