Domain 1 · 28% of Exam

Data Preparation for Machine Learning

Domain 1 validates your ability to ingest, transform, validate, prepare, and manage data for machine learning model development.

What You'll Be Tested On

  • Ingesting and transforming structured, semi-structured, and unstructured datasets
  • Cleaning, normalizing, encoding, labeling, and validating data before training
  • Managing datasets in S3 and cataloging them with AWS Glue and Athena
  • Creating reusable features and preventing data leakage
  • Checking data quality, class imbalance, outliers, and schema drift

Key AWS Services in This Domain

Exam Tips for Domain 1

💡

Start every data scenario by identifying the source, format, quality issue, and target ML workflow.

💡

Use Glue and Athena for discovery and transformation when the data is already in S3.

💡

Feature leakage is a common trap: do not use information that would not exist at prediction time.

💡

Know when an online feature store is needed for low-latency inference and when the offline store supports training.

Practice Domain 1 Questions

Test your knowledge of Data Preparation for Machine Learning with practice questions from our MLA-C01 question bank.

Start Practice Quiz →

Other MLA-C01 Domains