Inference Options
- Real-time endpoints serve low-latency online predictions.
- Serverless inference reduces idle cost for intermittent traffic.
- Asynchronous inference supports larger payloads and longer processing times.
- Batch transform scores stored datasets without a persistent endpoint.
Production Controls
- Use endpoint variants for traffic shifting or A/B testing.
- Configure autoscaling from endpoint metrics to handle traffic changes.
- Keep approved model versions in Model Registry before production deployment.
- Monitor endpoint invocation metrics and model quality after release.
Practice Model Deployment Questions
Put your knowledge to the test with practice questions.