Agentic AI

How to Deploy AI Agents in Production on AWS

Apr 07, 2026·11 min read

A startup wants to deploy AI agents for internal support automation. The initial target is low monthly cost, but the architecture must already include secure access, auditable actions, and a path to scale when usage grows.

AWSAgentic AICost Optimization

How to Deploy AI Agents in Production on AWS

Scenario

Problem and Business Context

Most teams start with a single app server calling an LLM API directly. That works for demos, but production needs more:

identity-aware access control (who can call which tool)
secrets handling without hardcoded keys
queueing for spikes
observability for latency, failures, and token usage
clear blast-radius boundaries so an agent bug does not become an infrastructure incident

The business objective is to automate repetitive internal support tasks (policy lookup, ticket summarization, and approved action execution) while keeping the first 3-6 months of spend predictable.

Architecture Options and Trade-offs

Option A: Lambda + API Gateway + SQS + DynamoDB (recommended for this scenario)

Pros: lowest ops burden, scales to zero, strong IAM integration, good for bursty traffic
Cons: cold starts, careful timeout/async design needed

Option B: ECS Fargate always-on API + worker

Pros: stable low latency, easier long-running workflows
Cons: higher baseline cost than Lambda at low traffic

Option C: EKS + service mesh

Pros: maximum control
Cons: highest operational overhead and engineering time

For a cost-sensitive startup, Option A is the best first production shape.

Recommended AWS Architecture

graph TD U[Internal Users] --> IdP[OIDC IdP or IAM Identity Center] U --> APIGW[API Gateway HTTP API + JWT Authorizer] APIGW --> L1[Lambda: FastAPI Agent API] L1 --> DDB[(DynamoDB Sessions + Audit State)] L1 --> SQS[(SQS Agent Jobs)] L1 --> SM[Secrets Manager] L1 --> BR[Amazon Bedrock] SQS --> L2[Lambda: Tool Worker] L2 --> DDB L2 --> S3[(S3 Knowledge/Artifacts)] L2 --> BR L1 --> CW[CloudWatch Logs + Metrics + Alarms] L2 --> CW CW --> SNS[SNS On-call Notifications] WAF[AWS WAF] --> APIGW

Production Design Notes

API ingress: API Gateway HTTP API with JWT authorizer.
Agent API: FastAPI in Lambda via Mangum.
Async tool execution: SQS + worker Lambda.
State: DynamoDB single-table design for sessions/jobs.
Secrets: AWS Secrets Manager with KMS encryption.
LLM: Amazon Bedrock model invocation with strict IAM resource scope.
Audit trail: CloudWatch Logs + DynamoDB immutable event records.

Step-by-Step Implementation

1) Prerequisites

AWS CLI v2 configured
Python 3.12+
jq (bash workflow)
An OIDC provider for employee auth

Pricing note: verify all current prices in the official AWS Pricing pages before rollout.

2) Bootstrap environment variables

export AWS_REGION=us-east-1
export PROJECT=internal-support-agent
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export ARTIFACT_BUCKET=${PROJECT}-${ACCOUNT_ID}-${AWS_REGION}
export API_FN=${PROJECT}-api
export WORKER_FN=${PROJECT}-worker
export TABLE_NAME=${PROJECT}-state
export QUEUE_NAME=${PROJECT}-jobs

$env:AWS_REGION = "us-east-1"
$env:PROJECT = "internal-support-agent"
$env:ACCOUNT_ID = (aws sts get-caller-identity --query Account --output text)
$env:ARTIFACT_BUCKET = "$($env:PROJECT)-$($env:ACCOUNT_ID)-$($env:AWS_REGION)"
$env:API_FN = "$($env:PROJECT)-api"
$env:WORKER_FN = "$($env:PROJECT)-worker"
$env:TABLE_NAME = "$($env:PROJECT)-state"
$env:QUEUE_NAME = "$($env:PROJECT)-jobs"

3) Create core data and messaging resources

aws s3api create-bucket \
  --bucket "$ARTIFACT_BUCKET" \
  --region "$AWS_REGION"

aws dynamodb create-table \
  --table-name "$TABLE_NAME" \
  --attribute-definitions AttributeName=pk,AttributeType=S AttributeName=sk,AttributeType=S \
  --key-schema AttributeName=pk,KeyType=HASH AttributeName=sk,KeyType=RANGE \
  --billing-mode PAY_PER_REQUEST \
  --sse-specification Enabled=true

aws sqs create-queue \
  --queue-name "$QUEUE_NAME" \
  --attributes VisibilityTimeout=120,KmsMasterKeyId=alias/aws/sqs

aws s3api create-bucket --bucket $env:ARTIFACT_BUCKET --region $env:AWS_REGION

aws dynamodb create-table `
  --table-name $env:TABLE_NAME `
  --attribute-definitions AttributeName=pk,AttributeType=S AttributeName=sk,AttributeType=S `
  --key-schema AttributeName=pk,KeyType=HASH AttributeName=sk,KeyType=RANGE `
  --billing-mode PAY_PER_REQUEST `
  --sse-specification Enabled=true

aws sqs create-queue `
  --queue-name $env:QUEUE_NAME `
  --attributes VisibilityTimeout=120,KmsMasterKeyId=alias/aws/sqs

4) Store secrets safely (no hardcoded credentials)

aws secretsmanager create-secret \
  --name "${PROJECT}/config" \
  --description "Agent runtime secrets" \
  --secret-string '{"BEDROCK_MODEL_ID":"amazon.nova-lite-v1:0","ALLOWED_TOOL_DOMAINS":"jira.internal,confluence.internal"}'

aws secretsmanager create-secret `
  --name "$($env:PROJECT)/config" `
  --description "Agent runtime secrets" `
  --secret-string '{"BEDROCK_MODEL_ID":"amazon.nova-lite-v1:0","ALLOWED_TOOL_DOMAINS":"jira.internal,confluence.internal"}'

5) IAM least-privilege role for Lambda

trust-policy.json

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {"Service": "lambda.amazonaws.com"},
      "Action": "sts:AssumeRole"
    }
  ]
}

Create role and attach minimum policies.

ROLE_ARN=$(aws iam create-role \
  --role-name "${PROJECT}-lambda-role" \
  --assume-role-policy-document file://trust-policy.json \
  --query Role.Arn --output text)

cat > permissions-policy.json << 'JSON'
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Logs",
      "Effect": "Allow",
      "Action": ["logs:CreateLogGroup","logs:CreateLogStream","logs:PutLogEvents"],
      "Resource": "*"
    },
    {
      "Sid": "DynamoAccess",
      "Effect": "Allow",
      "Action": ["dynamodb:PutItem","dynamodb:GetItem","dynamodb:UpdateItem","dynamodb:Query"],
      "Resource": "arn:aws:dynamodb:*:*:table/internal-support-agent-state"
    },
    {
      "Sid": "QueueAccess",
      "Effect": "Allow",
      "Action": ["sqs:SendMessage","sqs:ReceiveMessage","sqs:DeleteMessage","sqs:GetQueueAttributes"],
      "Resource": "arn:aws:sqs:*:*:internal-support-agent-jobs"
    },
    {
      "Sid": "SecretsRead",
      "Effect": "Allow",
      "Action": ["secretsmanager:GetSecretValue"],
      "Resource": "arn:aws:secretsmanager:*:*:secret:internal-support-agent/config*"
    },
    {
      "Sid": "BedrockInvoke",
      "Effect": "Allow",
      "Action": ["bedrock:InvokeModel","bedrock:InvokeModelWithResponseStream"],
      "Resource": "*"
    }
  ]
}
JSON

aws iam put-role-policy \
  --role-name "${PROJECT}-lambda-role" \
  --policy-name "${PROJECT}-runtime-policy" \
  --policy-document file://permissions-policy.json

6) FastAPI agent API code

app/main.py

import json
import os
import uuid
from datetime import datetime, timezone

import boto3
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field
from mangum import Mangum

app = FastAPI(title="Internal Support Agent API")

ddb = boto3.resource("dynamodb")
sqs = boto3.client("sqs")
secrets = boto3.client("secretsmanager")

TABLE_NAME = os.environ["TABLE_NAME"]
QUEUE_URL = os.environ["QUEUE_URL"]
SECRET_NAME = os.environ["SECRET_NAME"]

table = ddb.Table(TABLE_NAME)


class AskRequest(BaseModel):
    user_id: str = Field(min_length=3)
    question: str = Field(min_length=3, max_length=4000)


@app.get("/health")
def health() -> dict:
    return {"ok": True, "service": "agent-api"}


@app.post("/ask")
def ask(req: AskRequest) -> dict:
    if "password" in req.question.lower():
        raise HTTPException(status_code=400, detail="Sensitive operation blocked")

    job_id = str(uuid.uuid4())
    now = datetime.now(timezone.utc).isoformat()

    table.put_item(Item={
        "pk": f"USER#{req.user_id}",
        "sk": f"JOB#{job_id}",
        "status": "queued",
        "question": req.question,
        "created_at": now,
    })

    sqs.send_message(
        QueueUrl=QUEUE_URL,
        MessageBody=json.dumps({"job_id": job_id, "user_id": req.user_id, "question": req.question}),
        MessageAttributes={"event_type": {"DataType": "String", "StringValue": "agent_request"}},
    )
    return {"job_id": job_id, "status": "queued"}


@app.get("/jobs/{user_id}/{job_id}")
def get_job(user_id: str, job_id: str) -> dict:
    resp = table.get_item(Key={"pk": f"USER#{user_id}", "sk": f"JOB#{job_id}"})
    item = resp.get("Item")
    if not item:
        raise HTTPException(status_code=404, detail="Job not found")
    return item


handler = Mangum(app)

app/worker.py

import json
import os

import boto3

bedrock = boto3.client("bedrock-runtime", region_name=os.environ.get("AWS_REGION", "us-east-1"))
ddb = boto3.resource("dynamodb")

table = ddb.Table(os.environ["TABLE_NAME"])
MODEL_ID = os.environ.get("MODEL_ID", "amazon.nova-lite-v1:0")


def _invoke_model(prompt: str) -> str:
    body = {
        "messages": [{"role": "user", "content": [{"text": prompt}]}],
        "inferenceConfig": {"maxTokens": 600, "temperature": 0.2}
    }
    response = bedrock.invoke_model(
        modelId=MODEL_ID,
        body=json.dumps(body),
        accept="application/json",
        contentType="application/json",
    )
    payload = json.loads(response["body"].read())
    return payload.get("output", {}).get("message", {}).get("content", [{}])[0].get("text", "")


def lambda_handler(event, context):
    for record in event.get("Records", []):
        msg = json.loads(record["body"])
        user_id = msg["user_id"]
        job_id = msg["job_id"]
        question = msg["question"]

        answer = _invoke_model(
            f"You are an internal support assistant. Answer safely and concisely. Question: {question}"
        )

        table.update_item(
            Key={"pk": f"USER#{user_id}", "sk": f"JOB#{job_id}"},
            UpdateExpression="SET #s=:s, answer=:a",
            ExpressionAttributeNames={"#s": "status"},
            ExpressionAttributeValues={":s": "done", ":a": answer},
        )

requirements.txt

fastapi==0.115.0
mangum==0.17.0
boto3==1.35.0

7) Package and deploy Lambda functions

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt -t package
cp -r app package/
cd package && zip -r ../agent-api.zip . && cd ..

aws lambda create-function \
  --function-name "$API_FN" \
  --runtime python3.12 \
  --handler app.main.handler \
  --role "$ROLE_ARN" \
  --timeout 29 \
  --memory-size 512 \
  --zip-file fileb://agent-api.zip \
  --environment "Variables={TABLE_NAME=$TABLE_NAME,QUEUE_URL=$(aws sqs get-queue-url --queue-name $QUEUE_NAME --query QueueUrl --output text),SECRET_NAME=$PROJECT/config}"

aws lambda create-function \
  --function-name "$WORKER_FN" \
  --runtime python3.12 \
  --handler app.worker.lambda_handler \
  --role "$ROLE_ARN" \
  --timeout 120 \
  --memory-size 1024 \
  --zip-file fileb://agent-api.zip \
  --environment "Variables={TABLE_NAME=$TABLE_NAME,MODEL_ID=amazon.nova-lite-v1:0}"

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt -t package
Copy-Item -Recurse app package
Compress-Archive -Path package\* -DestinationPath agent-api.zip -Force

$queueUrl = aws sqs get-queue-url --queue-name $env:QUEUE_NAME --query QueueUrl --output text

aws lambda create-function `
  --function-name $env:API_FN `
  --runtime python3.12 `
  --handler app.main.handler `
  --role $ROLE_ARN `
  --timeout 29 `
  --memory-size 512 `
  --zip-file fileb://agent-api.zip `
  --environment "Variables={TABLE_NAME=$($env:TABLE_NAME),QUEUE_URL=$queueUrl,SECRET_NAME=$($env:PROJECT)/config}"

aws lambda create-function `
  --function-name $env:WORKER_FN `
  --runtime python3.12 `
  --handler app.worker.lambda_handler `
  --role $ROLE_ARN `
  --timeout 120 `
  --memory-size 1024 `
  --zip-file fileb://agent-api.zip `
  --environment "Variables={TABLE_NAME=$($env:TABLE_NAME),MODEL_ID=amazon.nova-lite-v1:0}"

8) Expose API with JWT auth and WAF

API_ID=$(aws apigatewayv2 create-api --name "${PROJECT}-http" --protocol-type HTTP --target "arn:aws:lambda:${AWS_REGION}:${ACCOUNT_ID}:function:${API_FN}" --query ApiId --output text)

aws lambda add-permission \
  --function-name "$API_FN" \
  --statement-id apigw-access \
  --action lambda:InvokeFunction \
  --principal apigateway.amazonaws.com \
  --source-arn "arn:aws:execute-api:${AWS_REGION}:${ACCOUNT_ID}:${API_ID}/*/*"

AUTH_ID=$(aws apigatewayv2 create-authorizer \
  --api-id "$API_ID" \
  --authorizer-type JWT \
  --name employee-jwt \
  --identity-source '$request.header.Authorization' \
  --jwt-configuration Audience=internal-support,Issuer=https://id.example.com \
  --query AuthorizerId --output text)

aws apigatewayv2 update-route \
  --api-id "$API_ID" \
  --route-id "$(aws apigatewayv2 get-routes --api-id $API_ID --query 'Items[?RouteKey==`$default`].RouteId' --output text)" \
  --authorization-type JWT \
  --authorizer-id "$AUTH_ID"

For WAF, create a Web ACL and associate it with API Gateway stage ARN.

9) Connect worker Lambda to SQS

QUEUE_ARN=$(aws sqs get-queue-attributes --queue-url "$(aws sqs get-queue-url --queue-name $QUEUE_NAME --query QueueUrl --output text)" --attribute-names QueueArn --query Attributes.QueueArn --output text)

aws lambda create-event-source-mapping \
  --function-name "$WORKER_FN" \
  --event-source-arn "$QUEUE_ARN" \
  --batch-size 5

10) Monitoring and alerting

Create CloudWatch alarms:

Lambda Errors > 0 for 5 minutes
Lambda Duration p95 threshold
SQS ApproximateAgeOfOldestMessage threshold
API Gateway 5xx > threshold

aws cloudwatch put-metric-alarm \
  --alarm-name "${PROJECT}-worker-errors" \
  --namespace AWS/Lambda \
  --metric-name Errors \
  --dimensions Name=FunctionName,Value="$WORKER_FN" \
  --statistic Sum --period 60 --evaluation-periods 5 --threshold 1 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --alarm-actions "arn:aws:sns:${AWS_REGION}:${ACCOUNT_ID}:platform-alerts"

Also emit structured JSON logs with job_id, user_id, model_id, latency_ms, and token_estimate.

11) Cost optimization playbook

Start with small models (for example, Nova Lite class) and route hard queries to larger models only when needed.
Enforce maxTokens and prompt length limits at API level.
Cache deterministic responses in DynamoDB (short TTL for FAQ-style questions).
Batch non-urgent tasks through SQS.
Use CloudWatch dashboards for tokens-per-team and cost-per-request trends.
Create AWS Budgets alerts.

aws budgets create-budget \
  --account-id "$ACCOUNT_ID" \
  --budget '{
    "BudgetName":"internal-support-agent-monthly",
    "BudgetLimit":{"Amount":"300","Unit":"USD"},
    "TimeUnit":"MONTHLY",
    "BudgetType":"COST"
  }'

Always verify pricing with:

https://aws.amazon.com/bedrock/pricing/
https://aws.amazon.com/lambda/pricing/
https://aws.amazon.com/api-gateway/pricing/
https://aws.amazon.com/sqs/pricing/
https://aws.amazon.com/dynamodb/pricing/

12) Production readiness checklist

JWT auth enforced on all routes (no anonymous fallback)
Secrets only in Secrets Manager or SSM Parameter Store
IAM permissions scoped to exact tables/queues/secrets
Tool calls allowlist internal domains/actions only
CloudWatch alarms wired to SNS/on-call
SQS DLQ configured and tested
Replay-safe job processing (idempotency key)
Budget alerts active at 50%, 80%, 100%
Runbook exists for model outage, queue backlog, and auth failure
Incident drill completed before broad rollout

Final recommendation

This Lambda-first architecture is usually the best production baseline for internal AI agents in startups: low initial cost, secure controls, and a straightforward migration path to ECS/Fargate or multi-region designs when load and complexity grow.

Source

platform/archive/articles/how-to-deploy-ai-agents-in-production-on-aws.md

How to Deploy AI Agents in Production on AWS

Scenario

Problem and Business Context

Architecture Options and Trade-offs

Option A: Lambda + API Gateway + SQS + DynamoDB (recommended for this scenario)

Option B: ECS Fargate always-on API + worker

Option C: EKS + service mesh

Recommended AWS Architecture

Production Design Notes

Step-by-Step Implementation

1) Prerequisites

2) Bootstrap environment variables

3) Create core data and messaging resources

4) Store secrets safely (no hardcoded credentials)

5) IAM least-privilege role for Lambda

6) FastAPI agent API code

7) Package and deploy Lambda functions

8) Expose API with JWT auth and WAF

9) Connect worker Lambda to SQS

10) Monitoring and alerting

11) Cost optimization playbook

12) Production readiness checklist

Final recommendation

Related Articles

Top 10 Community GitHub Repositories (Last 30 Days) for AWS + AI + Agentic + Blockchain

Governing MCP and Agentic AI on AWS: Identity, Permissions, Observability, and Audit at Scale

SEO in the Agentic Search Era: AWS-Based GEO/SEO Operations for AI Overviews and Copilot Citations

Blockchain Security Operations with Agentic AI on AWS: Detect, Triage, and Respond