📋 MLA-C01 Data Preparation Cheat Sheet

Data preparation is the largest MLA-C01 domain. Expect scenarios about data ingestion, cleaning, transformation, validation, and feature readiness.

Core Workflow

  • Ingest source data into durable storage, commonly S3.
  • Discover schema and metadata with Glue crawlers and the Data Catalog.
  • Transform and clean data with Glue, DataBrew, SageMaker Processing, or notebooks.
  • Validate schema, missing values, class balance, and outliers before training.

Exam Cues

  • Need serverless SQL over S3: Athena.
  • Need managed ETL and cataloging: AWS Glue.
  • Need reusable low-latency features: SageMaker Feature Store online store.
  • Need training datasets and historical features: offline feature store or S3.

Practice Data Preparation Questions

Put your knowledge to the test with practice questions.

More MLA-C01 Cheat Sheets