What You'll Be Tested On
- Ingesting and transforming structured, semi-structured, and unstructured datasets
- Cleaning, normalizing, encoding, labeling, and validating data before training
- Managing datasets in S3 and cataloging them with AWS Glue and Athena
- Creating reusable features and preventing data leakage
- Checking data quality, class imbalance, outliers, and schema drift
Key AWS Services in This Domain
Exam Tips for Domain 1
Start every data scenario by identifying the source, format, quality issue, and target ML workflow.
Use Glue and Athena for discovery and transformation when the data is already in S3.
Feature leakage is a common trap: do not use information that would not exist at prediction time.
Know when an online feature store is needed for low-latency inference and when the offline store supports training.
Practice Domain 1 Questions
Test your knowledge of Data Preparation for Machine Learning with practice questions from our MLA-C01 question bank.
Start Practice Quiz →