What You'll Be Tested On

Choosing real-time, serverless, asynchronous, or batch inference based on latency and traffic patterns
Deploying SageMaker endpoints with variants, autoscaling, rollback, and blue/green style patterns
Building SageMaker Pipelines for processing, training, evaluation, registration, and approval
Using CI/CD and IaC to automate ML workflow orchestration
Optimizing production inference cost, throughput, and reliability

Key AWS Services in This Domain

💡

Real-time endpoints are for low latency, batch transform is for offline scoring, and asynchronous inference fits large payloads or longer processing.

💡

Separate orchestration from model code so retraining and deployment are repeatable.

💡

Endpoint autoscaling is driven by traffic and latency signals, not by training metrics.

💡

Use approval gates when a model must be reviewed before production deployment.

Test your knowledge of Deployment and Orchestration of ML Workflows with practice questions from our MLA-C01 question bank.