Why This Cheat Sheet Matters for MLA-C01
This cheat sheet covers the most important Data Preparation for Machine Learning concepts tested on the MLA-C01 (AWS Machine Learning Engineer Associate) certification exam. It contains 2 sections with 8 key points that you should memorize before exam day. Prepare for MLA-C01 data ingestion, cleaning, transformation, validation, labeling, and feature readiness scenarios across AWS data services. Use this as a quick-reference guide during your final review sessions.
2Sections
8Key Points
Core Workflow
- Ingest source data into durable storage, commonly S3.
- Discover schema and metadata with Glue crawlers and the Data Catalog.
- Transform and clean data with Glue, DataBrew, SageMaker Processing, or notebooks.
- Validate schema, missing values, class balance, and outliers before training.
Exam Cues
- Need serverless SQL over S3: Athena.
- Need managed ETL and cataloging: AWS Glue.
- Need reusable low-latency features: SageMaker Feature Store online store.
- Need training datasets and historical features: offline feature store or S3.
Practice Data Preparation Questions
Put your knowledge to the test with practice questions.