🃏 AWS Glue Flashcards

Active recall cards for DEA-C01 Glue ETL, crawlers, Data Catalog, job bookmarks, and DataBrew.

Card 1 of 6

Question

What does a Glue crawler do?

Click to reveal answer

Answer

Discovers data schema and populates the Glue Data Catalog with table definitions.

Click to flip back

All AWS Glue Flashcards

1

Q: What does a Glue crawler do?

A: Discovers data schema and populates the Glue Data Catalog with table definitions.

2

Q: What is the Glue Data Catalog?

A: A centralized metadata repository that stores database and table definitions, used by Athena, EMR, and Redshift Spectrum.

3

Q: How do Glue job bookmarks help?

A: They track previously processed data so incremental ETL jobs only process new or changed data.

4

Q: What is Glue DataBrew?

A: A visual data preparation tool for cleaning and normalizing data without writing code.

5

Q: What execution engine do Glue ETL jobs use?

A: Apache Spark (PySpark or Scala), or Python Shell for lightweight jobs.

6

Q: What is a DynamicFrame?

A: A Glue extension of Spark DataFrames that handles schema inconsistencies and provides built-in transforms.

More DEA-C01 Flashcard Decks