← Blog/How to Deploy AI Agents in Production on AWS
Agentic AI

How to Deploy AI Agents in Production on AWS

Apr 07, 2026·11 min read

A startup wants to deploy AI agents for internal support automation. The initial target is low monthly cost, but the architecture must already include secure access, auditable actions, and a path to scale when usage grows.

AWSAgentic AICost Optimization

How to Deploy AI Agents in Production on AWS

Scenario

A startup wants to deploy AI agents for internal support automation. The initial target is low monthly cost, but the architecture must already include secure access, auditable actions, and a path to scale when usage grows.

Problem and Business Context

Most teams start with a single app server calling an LLM API directly. That works for demos, but production needs more:

  • identity-aware access control (who can call which tool)
  • secrets handling without hardcoded keys
  • queueing for spikes
  • observability for latency, failures, and token usage
  • clear blast-radius boundaries so an agent bug does not become an infrastructure incident

The business objective is to automate repetitive internal support tasks (policy lookup, ticket summarization, and approved action execution) while keeping the first 3-6 months of spend predictable.

Architecture Options and Trade-offs

Option A: Lambda + API Gateway + SQS + DynamoDB (recommended for this scenario)

  • Pros: lowest ops burden, scales to zero, strong IAM integration, good for bursty traffic
  • Cons: cold starts, careful timeout/async design needed

Option B: ECS Fargate always-on API + worker

  • Pros: stable low latency, easier long-running workflows
  • Cons: higher baseline cost than Lambda at low traffic

Option C: EKS + service mesh

  • Pros: maximum control
  • Cons: highest operational overhead and engineering time

For a cost-sensitive startup, Option A is the best first production shape.

Recommended AWS Architecture

graph TD U[Internal Users] --> IdP[OIDC IdP or IAM Identity Center] U --> APIGW[API Gateway HTTP API + JWT Authorizer] APIGW --> L1[Lambda: FastAPI Agent API] L1 --> DDB[(DynamoDB Sessions + Audit State)] L1 --> SQS[(SQS Agent Jobs)] L1 --> SM[Secrets Manager] L1 --> BR[Amazon Bedrock] SQS --> L2[Lambda: Tool Worker] L2 --> DDB L2 --> S3[(S3 Knowledge/Artifacts)] L2 --> BR L1 --> CW[CloudWatch Logs + Metrics + Alarms] L2 --> CW CW --> SNS[SNS On-call Notifications] WAF[AWS WAF] --> APIGW

Production Design Notes

  • API ingress: API Gateway HTTP API with JWT authorizer.
  • Agent API: FastAPI in Lambda via Mangum.
  • Async tool execution: SQS + worker Lambda.
  • State: DynamoDB single-table design for sessions/jobs.
  • Secrets: AWS Secrets Manager with KMS encryption.
  • LLM: Amazon Bedrock model invocation with strict IAM resource scope.
  • Audit trail: CloudWatch Logs + DynamoDB immutable event records.

Step-by-Step Implementation

1) Prerequisites

  • AWS CLI v2 configured
  • Python 3.12+
  • jq (bash workflow)
  • An OIDC provider for employee auth

Pricing note: verify all current prices in the official AWS Pricing pages before rollout.

2) Bootstrap environment variables

export AWS_REGION=us-east-1
export PROJECT=internal-support-agent
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export ARTIFACT_BUCKET=${PROJECT}-${ACCOUNT_ID}-${AWS_REGION}
export API_FN=${PROJECT}-api
export WORKER_FN=${PROJECT}-worker
export TABLE_NAME=${PROJECT}-state
export QUEUE_NAME=${PROJECT}-jobs
$env:AWS_REGION = "us-east-1"
$env:PROJECT = "internal-support-agent"
$env:ACCOUNT_ID = (aws sts get-caller-identity --query Account --output text)
$env:ARTIFACT_BUCKET = "$($env:PROJECT)-$($env:ACCOUNT_ID)-$($env:AWS_REGION)"
$env:API_FN = "$($env:PROJECT)-api"
$env:WORKER_FN = "$($env:PROJECT)-worker"
$env:TABLE_NAME = "$($env:PROJECT)-state"
$env:QUEUE_NAME = "$($env:PROJECT)-jobs"

3) Create core data and messaging resources

aws s3api create-bucket \
  --bucket "$ARTIFACT_BUCKET" \
  --region "$AWS_REGION"

aws dynamodb create-table \
  --table-name "$TABLE_NAME" \
  --attribute-definitions AttributeName=pk,AttributeType=S AttributeName=sk,AttributeType=S \
  --key-schema AttributeName=pk,KeyType=HASH AttributeName=sk,KeyType=RANGE \
  --billing-mode PAY_PER_REQUEST \
  --sse-specification Enabled=true

aws sqs create-queue \
  --queue-name "$QUEUE_NAME" \
  --attributes VisibilityTimeout=120,KmsMasterKeyId=alias/aws/sqs
aws s3api create-bucket --bucket $env:ARTIFACT_BUCKET --region $env:AWS_REGION

aws dynamodb create-table `
  --table-name $env:TABLE_NAME `
  --attribute-definitions AttributeName=pk,AttributeType=S AttributeName=sk,AttributeType=S `
  --key-schema AttributeName=pk,KeyType=HASH AttributeName=sk,KeyType=RANGE `
  --billing-mode PAY_PER_REQUEST `
  --sse-specification Enabled=true

aws sqs create-queue `
  --queue-name $env:QUEUE_NAME `
  --attributes VisibilityTimeout=120,KmsMasterKeyId=alias/aws/sqs

4) Store secrets safely (no hardcoded credentials)

aws secretsmanager create-secret \
  --name "${PROJECT}/config" \
  --description "Agent runtime secrets" \
  --secret-string '{"BEDROCK_MODEL_ID":"amazon.nova-lite-v1:0","ALLOWED_TOOL_DOMAINS":"jira.internal,confluence.internal"}'
aws secretsmanager create-secret `
  --name "$($env:PROJECT)/config" `
  --description "Agent runtime secrets" `
  --secret-string '{"BEDROCK_MODEL_ID":"amazon.nova-lite-v1:0","ALLOWED_TOOL_DOMAINS":"jira.internal,confluence.internal"}'

5) IAM least-privilege role for Lambda

trust-policy.json

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {"Service": "lambda.amazonaws.com"},
      "Action": "sts:AssumeRole"
    }
  ]
}

Create role and attach minimum policies.

ROLE_ARN=$(aws iam create-role \
  --role-name "${PROJECT}-lambda-role" \
  --assume-role-policy-document file://trust-policy.json \
  --query Role.Arn --output text)

cat > permissions-policy.json << 'JSON'
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Logs",
      "Effect": "Allow",
      "Action": ["logs:CreateLogGroup","logs:CreateLogStream","logs:PutLogEvents"],
      "Resource": "*"
    },
    {
      "Sid": "DynamoAccess",
      "Effect": "Allow",
      "Action": ["dynamodb:PutItem","dynamodb:GetItem","dynamodb:UpdateItem","dynamodb:Query"],
      "Resource": "arn:aws:dynamodb:*:*:table/internal-support-agent-state"
    },
    {
      "Sid": "QueueAccess",
      "Effect": "Allow",
      "Action": ["sqs:SendMessage","sqs:ReceiveMessage","sqs:DeleteMessage","sqs:GetQueueAttributes"],
      "Resource": "arn:aws:sqs:*:*:internal-support-agent-jobs"
    },
    {
      "Sid": "SecretsRead",
      "Effect": "Allow",
      "Action": ["secretsmanager:GetSecretValue"],
      "Resource": "arn:aws:secretsmanager:*:*:secret:internal-support-agent/config*"
    },
    {
      "Sid": "BedrockInvoke",
      "Effect": "Allow",
      "Action": ["bedrock:InvokeModel","bedrock:InvokeModelWithResponseStream"],
      "Resource": "*"
    }
  ]
}
JSON

aws iam put-role-policy \
  --role-name "${PROJECT}-lambda-role" \
  --policy-name "${PROJECT}-runtime-policy" \
  --policy-document file://permissions-policy.json

6) FastAPI agent API code

app/main.py

import json
import os
import uuid
from datetime import datetime, timezone

import boto3
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field
from mangum import Mangum

app = FastAPI(title="Internal Support Agent API")

ddb = boto3.resource("dynamodb")
sqs = boto3.client("sqs")
secrets = boto3.client("secretsmanager")

TABLE_NAME = os.environ["TABLE_NAME"]
QUEUE_URL = os.environ["QUEUE_URL"]
SECRET_NAME = os.environ["SECRET_NAME"]

table = ddb.Table(TABLE_NAME)


class AskRequest(BaseModel):
    user_id: str = Field(min_length=3)
    question: str = Field(min_length=3, max_length=4000)


@app.get("/health")
def health() -> dict:
    return {"ok": True, "service": "agent-api"}


@app.post("/ask")
def ask(req: AskRequest) -> dict:
    if "password" in req.question.lower():
        raise HTTPException(status_code=400, detail="Sensitive operation blocked")

    job_id = str(uuid.uuid4())
    now = datetime.now(timezone.utc).isoformat()

    table.put_item(Item={
        "pk": f"USER#{req.user_id}",
        "sk": f"JOB#{job_id}",
        "status": "queued",
        "question": req.question,
        "created_at": now,
    })

    sqs.send_message(
        QueueUrl=QUEUE_URL,
        MessageBody=json.dumps({"job_id": job_id, "user_id": req.user_id, "question": req.question}),
        MessageAttributes={"event_type": {"DataType": "String", "StringValue": "agent_request"}},
    )
    return {"job_id": job_id, "status": "queued"}


@app.get("/jobs/{user_id}/{job_id}")
def get_job(user_id: str, job_id: str) -> dict:
    resp = table.get_item(Key={"pk": f"USER#{user_id}", "sk": f"JOB#{job_id}"})
    item = resp.get("Item")
    if not item:
        raise HTTPException(status_code=404, detail="Job not found")
    return item


handler = Mangum(app)

app/worker.py

import json
import os

import boto3

bedrock = boto3.client("bedrock-runtime", region_name=os.environ.get("AWS_REGION", "us-east-1"))
ddb = boto3.resource("dynamodb")

table = ddb.Table(os.environ["TABLE_NAME"])
MODEL_ID = os.environ.get("MODEL_ID", "amazon.nova-lite-v1:0")


def _invoke_model(prompt: str) -> str:
    body = {
        "messages": [{"role": "user", "content": [{"text": prompt}]}],
        "inferenceConfig": {"maxTokens": 600, "temperature": 0.2}
    }
    response = bedrock.invoke_model(
        modelId=MODEL_ID,
        body=json.dumps(body),
        accept="application/json",
        contentType="application/json",
    )
    payload = json.loads(response["body"].read())
    return payload.get("output", {}).get("message", {}).get("content", [{}])[0].get("text", "")


def lambda_handler(event, context):
    for record in event.get("Records", []):
        msg = json.loads(record["body"])
        user_id = msg["user_id"]
        job_id = msg["job_id"]
        question = msg["question"]

        answer = _invoke_model(
            f"You are an internal support assistant. Answer safely and concisely. Question: {question}"
        )

        table.update_item(
            Key={"pk": f"USER#{user_id}", "sk": f"JOB#{job_id}"},
            UpdateExpression="SET #s=:s, answer=:a",
            ExpressionAttributeNames={"#s": "status"},
            ExpressionAttributeValues={":s": "done", ":a": answer},
        )

requirements.txt

fastapi==0.115.0
mangum==0.17.0
boto3==1.35.0

7) Package and deploy Lambda functions

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt -t package
cp -r app package/
cd package && zip -r ../agent-api.zip . && cd ..

aws lambda create-function \
  --function-name "$API_FN" \
  --runtime python3.12 \
  --handler app.main.handler \
  --role "$ROLE_ARN" \
  --timeout 29 \
  --memory-size 512 \
  --zip-file fileb://agent-api.zip \
  --environment "Variables={TABLE_NAME=$TABLE_NAME,QUEUE_URL=$(aws sqs get-queue-url --queue-name $QUEUE_NAME --query QueueUrl --output text),SECRET_NAME=$PROJECT/config}"

aws lambda create-function \
  --function-name "$WORKER_FN" \
  --runtime python3.12 \
  --handler app.worker.lambda_handler \
  --role "$ROLE_ARN" \
  --timeout 120 \
  --memory-size 1024 \
  --zip-file fileb://agent-api.zip \
  --environment "Variables={TABLE_NAME=$TABLE_NAME,MODEL_ID=amazon.nova-lite-v1:0}"
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt -t package
Copy-Item -Recurse app package
Compress-Archive -Path package\* -DestinationPath agent-api.zip -Force

$queueUrl = aws sqs get-queue-url --queue-name $env:QUEUE_NAME --query QueueUrl --output text

aws lambda create-function `
  --function-name $env:API_FN `
  --runtime python3.12 `
  --handler app.main.handler `
  --role $ROLE_ARN `
  --timeout 29 `
  --memory-size 512 `
  --zip-file fileb://agent-api.zip `
  --environment "Variables={TABLE_NAME=$($env:TABLE_NAME),QUEUE_URL=$queueUrl,SECRET_NAME=$($env:PROJECT)/config}"

aws lambda create-function `
  --function-name $env:WORKER_FN `
  --runtime python3.12 `
  --handler app.worker.lambda_handler `
  --role $ROLE_ARN `
  --timeout 120 `
  --memory-size 1024 `
  --zip-file fileb://agent-api.zip `
  --environment "Variables={TABLE_NAME=$($env:TABLE_NAME),MODEL_ID=amazon.nova-lite-v1:0}"

8) Expose API with JWT auth and WAF

API_ID=$(aws apigatewayv2 create-api --name "${PROJECT}-http" --protocol-type HTTP --target "arn:aws:lambda:${AWS_REGION}:${ACCOUNT_ID}:function:${API_FN}" --query ApiId --output text)

aws lambda add-permission \
  --function-name "$API_FN" \
  --statement-id apigw-access \
  --action lambda:InvokeFunction \
  --principal apigateway.amazonaws.com \
  --source-arn "arn:aws:execute-api:${AWS_REGION}:${ACCOUNT_ID}:${API_ID}/*/*"

AUTH_ID=$(aws apigatewayv2 create-authorizer \
  --api-id "$API_ID" \
  --authorizer-type JWT \
  --name employee-jwt \
  --identity-source '$request.header.Authorization' \
  --jwt-configuration Audience=internal-support,Issuer=https://id.example.com \
  --query AuthorizerId --output text)

aws apigatewayv2 update-route \
  --api-id "$API_ID" \
  --route-id "$(aws apigatewayv2 get-routes --api-id $API_ID --query 'Items[?RouteKey==`$default`].RouteId' --output text)" \
  --authorization-type JWT \
  --authorizer-id "$AUTH_ID"

For WAF, create a Web ACL and associate it with API Gateway stage ARN.

9) Connect worker Lambda to SQS

QUEUE_ARN=$(aws sqs get-queue-attributes --queue-url "$(aws sqs get-queue-url --queue-name $QUEUE_NAME --query QueueUrl --output text)" --attribute-names QueueArn --query Attributes.QueueArn --output text)

aws lambda create-event-source-mapping \
  --function-name "$WORKER_FN" \
  --event-source-arn "$QUEUE_ARN" \
  --batch-size 5

10) Monitoring and alerting

Create CloudWatch alarms:

  • Lambda Errors > 0 for 5 minutes
  • Lambda Duration p95 threshold
  • SQS ApproximateAgeOfOldestMessage threshold
  • API Gateway 5xx > threshold
aws cloudwatch put-metric-alarm \
  --alarm-name "${PROJECT}-worker-errors" \
  --namespace AWS/Lambda \
  --metric-name Errors \
  --dimensions Name=FunctionName,Value="$WORKER_FN" \
  --statistic Sum --period 60 --evaluation-periods 5 --threshold 1 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --alarm-actions "arn:aws:sns:${AWS_REGION}:${ACCOUNT_ID}:platform-alerts"

Also emit structured JSON logs with job_id, user_id, model_id, latency_ms, and token_estimate.

11) Cost optimization playbook

  • Start with small models (for example, Nova Lite class) and route hard queries to larger models only when needed.
  • Enforce maxTokens and prompt length limits at API level.
  • Cache deterministic responses in DynamoDB (short TTL for FAQ-style questions).
  • Batch non-urgent tasks through SQS.
  • Use CloudWatch dashboards for tokens-per-team and cost-per-request trends.
  • Create AWS Budgets alerts.
aws budgets create-budget \
  --account-id "$ACCOUNT_ID" \
  --budget '{
    "BudgetName":"internal-support-agent-monthly",
    "BudgetLimit":{"Amount":"300","Unit":"USD"},
    "TimeUnit":"MONTHLY",
    "BudgetType":"COST"
  }'

Always verify pricing with:

  • https://aws.amazon.com/bedrock/pricing/
  • https://aws.amazon.com/lambda/pricing/
  • https://aws.amazon.com/api-gateway/pricing/
  • https://aws.amazon.com/sqs/pricing/
  • https://aws.amazon.com/dynamodb/pricing/

12) Production readiness checklist

  • JWT auth enforced on all routes (no anonymous fallback)
  • Secrets only in Secrets Manager or SSM Parameter Store
  • IAM permissions scoped to exact tables/queues/secrets
  • Tool calls allowlist internal domains/actions only
  • CloudWatch alarms wired to SNS/on-call
  • SQS DLQ configured and tested
  • Replay-safe job processing (idempotency key)
  • Budget alerts active at 50%, 80%, 100%
  • Runbook exists for model outage, queue backlog, and auth failure
  • Incident drill completed before broad rollout

Final recommendation

This Lambda-first architecture is usually the best production baseline for internal AI agents in startups: low initial cost, secure controls, and a straightforward migration path to ECS/Fargate or multi-region designs when load and complexity grow.