Agentic AI

AI Coding Agents with DeepSeek Latest Model API on AWS and FastAPI

May 14, 2026·4 min read

Founder and Editor, Smash The Exam

Reviewed: 2026-05-26 · LinkedIn

AI Coding Agents with DeepSeek Latest Model API on AWS and FastAPI turns the concept into a usable execution plan with concrete checks and production-minded guardrails.

AWSAgentic AILLMDevOps

AI Coding Agents with DeepSeek Latest Model API on AWS and FastAPI

AI Focus 1: Risk controls worth enforcing early for predictable operations (Ai Coding Agents)

A DevOps team wants an internal AI coding assistant that reviews code, explains errors, and suggests fixes using the latest available DeepSeek API.

Editorial review note for Ai Coding Agents

This section was reviewed by a human editor to keep the recommendations actionable and technically grounded. Reviewed by: Med Amine Mahmoud. Last editorial review: 2026-05-26T16:10:01Z.

AI Focus 3: How to keep cost and reliability aligned for cleaner ownership (Ai Coding Agents)

AI Focus 4: What to document for your team for measurable outcomes (Ai Coding Agents)

export AWS_REGION=us-east-1
export PROJECT=deepseek-code-agent
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export FN_NAME=${PROJECT}-api
export USAGE_TABLE=${PROJECT}-usage

$env:AWS_REGION = "us-east-1"
$env:PROJECT = "deepseek-code-agent"
$env:ACCOUNT_ID = (aws sts get-caller-identity --query Account --output text)
$env:FN_NAME = "$($env:PROJECT)-api"
$env:USAGE_TABLE = "$($env:PROJECT)-usage"

AI Focus 5: Where this architecture earns its value for fewer incident surprises (Ai Coding Agents)

aws dynamodb create-table \
--table-name "$USAGE_TABLE" \
--attribute-definitions AttributeName=pk,AttributeType=S AttributeName=ts,AttributeType=S \
--key-schema AttributeName=pk,KeyType=HASH AttributeName=ts,KeyType=RANGE \
--billing-mode PAY_PER_REQUEST \
--sse-specification Enabled=true

aws secretsmanager create-secret \
--name "${PROJECT}/deepseek" \
--description "DeepSeek API key and config" \
--secret-string '{"DEEPSEEK_API_KEY":"REPLACE_ME","DEEPSEEK_MODEL":"deepseek-v4-flash"}'

aws dynamodb create-table `
--table-name $env:USAGE_TABLE `
--attribute-definitions AttributeName=pk,AttributeType=S AttributeName=ts,AttributeType=S `
--key-schema AttributeName=pk,KeyType=HASH AttributeName=ts,KeyType=RANGE `
--billing-mode PAY_PER_REQUEST `
--sse-specification Enabled=true

aws secretsmanager create-secret `
--name "$($env:PROJECT)/deepseek" `
--description "DeepSeek API key and config" `
--secret-string '{"DEEPSEEK_API_KEY":"REPLACE_ME","DEEPSEEK_MODEL":"deepseek-v4-flash"}'

AI Focus 6: Operational notes from real-world usage for this workload (Ai Coding Agents)

policy-deepseek-agent.json

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Logs",
"Effect": "Allow",
"Action": ["logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents"],
"Resource": "*"
},
{
"Sid": "ReadSecret",
"Effect": "Allow",
"Action": ["secretsmanager:GetSecretValue"],
"Resource": "arn:aws:secretsmanager:*:*:secret:deepseek-code-agent/deepseek*"
},
{
"Sid": "UsageTable",
"Effect": "Allow",
"Action": ["dynamodb:PutItem", "dynamodb:UpdateItem", "dynamodb:Query"],
"Resource": "arn:aws:dynamodb:*:*:table/deepseek-code-agent-usage"
}
]
}

Attach policy to Lambda execution role.

AI Focus 7: How to avoid expensive rework for your runbook (Ai Coding Agents)

app/main.py

import json
import os
import time
from datetime import datetime, timezone

import boto3
from fastapi import FastAPI, HTTPException, Header
from mangum import Mangum
from pydantic import BaseModel, Field
from openai import OpenAI

app = FastAPI(title="Internal DeepSeek Coding Agent")

secrets = boto3.client("secretsmanager")
ddb = boto3.resource("dynamodb")

SECRET_NAME = os.environ["SECRET_NAME"]
USAGE_TABLE = os.environ["USAGE_TABLE"]
MAX_PROMPT_CHARS = int(os.environ.get("MAX_PROMPT_CHARS", "12000"))
MAX_REQ_PER_MIN = int(os.environ.get("MAX_REQ_PER_MIN", "30"))

table = ddb.Table(USAGE_TABLE)


class ReviewRequest(BaseModel):
repo: str = Field(min_length=2)
diff: str = Field(min_length=5, max_length=50000)
question: str = Field(min_length=3, max_length=6000)


def _load_secret() -> dict:
payload = secrets.get_secret_value(SecretId=SECRET_NAME)
return json.loads(payload["SecretString"])


def _rate_limit(actor: str) -> None:
minute = datetime.now(timezone.utc).strftime("%Y%m%d%H%M")
key = f"ACTOR#{actor}#MIN#{minute}"
table.put_item(Item={"pk": key, "ts": str(time.time())})

resp = table.query(
KeyConditionExpression="pk = :pk",
ExpressionAttributeValues={":pk": key},
Select="COUNT",
)
if resp.get("Count", 0) > MAX_REQ_PER_MIN:
raise HTTPException(status_code=429, detail="Rate limit exceeded")


@app.get("/health")
def health() -> dict:
return {"ok": True}


@app.post("/review")
def review(req: ReviewRequest, x_employee_id: str = Header(default="unknown")) -> dict:
_rate_limit(x_employee_id)

if len(req.diff) > MAX_PROMPT_CHARS:
raise HTTPException(status_code=400, detail="Diff too large; upload summarized diff")

cfg = _load_secret()
client = OpenAI(api_key=cfg["DEEPSEEK_API_KEY"], base_url="https://api.deepseek.com")

prompt = (
"You are a senior code reviewer. Focus on correctness, security, and operational risk.\n"
f"Repository: {req.repo}\n"
f"Question: {req.question}\n"
f"Diff:\n{req.diff[:MAX_PROMPT_CHARS]}"
)

started = time.time()
completion = client.chat.completions.create(
model=cfg.get("DEEPSEEK_MODEL", "deepseek-v4-flash"),
messages=[{"role": "user", "content": prompt}],
temperature=0.1,
max_tokens=1200,
)
latency_ms = int((time.time() - started) * 1000)

answer = completion.choices[0].message.content
usage = getattr(completion, "usage", None)

table.put_item(Item={
"pk": f"REQ#{x_employee_id}",
"ts": datetime.now(timezone.utc).isoformat(),
"latency_ms": latency_ms,
"prompt_tokens": getattr(usage, "prompt_tokens", 0) if usage else 0,
"completion_tokens": getattr(usage, "completion_tokens", 0) if usage else 0,
})

return {"answer": answer, "latency_ms": latency_ms}


handler = Mangum(app)

requirements.txt

fastapi==0.115.0
mangum==0.17.0
openai==1.51.0
boto3==1.35.0

AI Focus 8: Where teams usually get this wrong for production readiness (Ai Coding Agents)

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt -t package
cp -r app package/
cd package && zip -r ../deepseek-agent.zip . && cd ..

aws lambda create-function \
--function-name "$FN_NAME" \
--runtime python3.12 \
--handler app.main.handler \
--role "arn:aws:iam::${ACCOUNT_ID}:role/${PROJECT}-lambda-role" \
--zip-file fileb://deepseek-agent.zip \
--timeout 30 --memory-size 1024 \
--environment "Variables={SECRET_NAME=$PROJECT/deepseek,USAGE_TABLE=$USAGE_TABLE,MAX_PROMPT_CHARS=12000,MAX_REQ_PER_MIN=30}"

API_ID=$(aws apigatewayv2 create-api --name "${PROJECT}-api" --protocol-type HTTP --target "arn:aws:lambda:${AWS_REGION}:${ACCOUNT_ID}:function:${FN_NAME}" --query ApiId --output text)

aws lambda add-permission \
--function-name "$FN_NAME" \
--statement-id apigw-access \
--action lambda:InvokeFunction \
--principal apigateway.amazonaws.com \
--source-arn "arn:aws:execute-api:${AWS_REGION}:${ACCOUNT_ID}:${API_ID}/*/*"

aws apigatewayv2 update-stage \
--api-id "$API_ID" \
--stage-name '$default' \
--default-route-settings ThrottlingBurstLimit=50,ThrottlingRateLimit=25

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt -t package
Copy-Item -Recurse app package
Compress-Archive -Path package\* -DestinationPath deepseek-agent.zip -Force

aws lambda create-function `
--function-name $env:FN_NAME `
--runtime python3.12 `
--handler app.main.handler `
--role "arn:aws:iam::$($env:ACCOUNT_ID):role/$($env:PROJECT)-lambda-role" `
--zip-file fileb://deepseek-agent.zip `
--timeout 30 --memory-size 1024 `
--environment "Variables={SECRET_NAME=$($env:PROJECT)/deepseek,USAGE_TABLE=$($env:USAGE_TABLE),MAX_PROMPT_CHARS=12000,MAX_REQ_PER_MIN=30}"

AI Focus 9: The practical decision path for sustained reliability (Ai Coding Agents)

WAF rate-based rules for IP abuse.
JWT auth with employee identity claims.
Deny oversized inputs at API Gateway and app layer.
Strip secrets and access tokens from logs.
Restrict outbound domains at network egress controls where possible.

AI Focus 10: How to execute without guesswork for secure delivery (Ai Coding Agents)

Default model: deepseek-v4-flash; escalate to deepseek-v4-pro only for complex cases.
Hard cap tokens (max_tokens) and input size.
Cache repeated review requests by hash of (repo, diff, question).
Track per-team token usage in DynamoDB and alarm on anomalies.
Use AWS Budgets and monthly charge alarms.

Example budget:

aws budgets create-budget \
--account-id "$ACCOUNT_ID" \
--budget '{
"BudgetName":"deepseek-code-agent-monthly",
"BudgetLimit":{"Amount":"500","Unit":"USD"},
"TimeUnit":"MONTHLY",
"BudgetType":"COST"
}'

DeepSeek pricing changes can be frequent. Verify directly before final budgeting.

AI Focus 11: What to validate before shipping for predictable operations (Ai Coding Agents)

CloudWatch dashboards:
requests/min
p95 latency
4xx/5xx rates
prompt/completion tokens
Alarms:
sustained 429s
Lambda errors
unexpected token surge

AI Focus 12: Tradeoffs that matter in production for exam and field confidence (Ai Coding Agents)

Secret key stored only in Secrets Manager
Legacy model names removed before 2026-07-24 deadline
API + app-level throttling enabled
Prompt size and token caps enforced
Cost budget alarms active
Audit records include actor, model, token usage, and latency
Runbook for DeepSeek API outage and fallback model tested

AI Focus 13: Implementation details that change outcomes for cleaner ownership (Ai Coding Agents)

For an internal coding assistant, this architecture gives strong security and cost control early. Keep model routing policy-driven, instrument token usage from day one, and treat rate limiting as a first-class reliability control.

AI Focus 14: Runtime checks you should not skip for measurable outcomes (Ai Coding Agents)

As of May 14, 2026, DeepSeek API documentation lists deepseek-v4-flash and deepseek-v4-pro as primary model IDs. Legacy names deepseek-chat and deepseek-reasoner are marked for deprecation on July 24, 2026. Always verify current model IDs and pricing before rollout:

https://api-docs.deepseek.com/
https://api-docs.deepseek.com/quick_start/pricing/

AI Focus 15: How this maps to real exam objectives for fewer incident surprises (Ai Coding Agents)

Internal coding assistants can quickly drift into risky behavior:

leaking code to external systems without policy control
unbounded token usage and rising cost
noisy suggestions with no observability
no rate controls, creating downstream API instability

The objective is a secure internal assistant API with controlled egress, scoped secrets, request quotas, and measurable quality/cost metrics.

AI Focus 16: Failure modes and quick prevention for this workload (Ai Coding Agents)

Recommended baseline (cost-aware and secure)

API Gateway HTTP API + JWT auth + WAF rate limit
Lambda (FastAPI + Mangum) for inference orchestration
DynamoDB for request cache and usage counters
Secrets Manager for DeepSeek API key
CloudWatch for metrics/alerts

This pattern minimizes fixed cost and is easier to operate than always-on clusters for early-stage usage.

graph TD Dev[Developers / CI Bots] --> APIGW[API Gateway HTTP API + JWT] WAF[AWS WAF rate-based rules] --> APIGW APIGW --> API[Lambda FastAPI Coding Assistant] API --> SM[Secrets Manager: DEEPSEEK_API_KEY] API --> DS[DeepSeek API] API --> DDB[(DynamoDB usage + cache)] API --> CW[CloudWatch Logs + Metrics] CW --> SNS[SNS Budget/Incident Alerts] BUD[AWS Budgets] --> SNS

AI Coding Agents with DeepSeek Latest Model API on AWS and FastAPI

AI Focus 1: Risk controls worth enforcing early for predictable operations (Ai Coding Agents)

Editorial review note for Ai Coding Agents

AI Focus 3: How to keep cost and reliability aligned for cleaner ownership (Ai Coding Agents)

AI Focus 4: What to document for your team for measurable outcomes (Ai Coding Agents)

AI Focus 5: Where this architecture earns its value for fewer incident surprises (Ai Coding Agents)

AI Focus 6: Operational notes from real-world usage for this workload (Ai Coding Agents)

AI Focus 7: How to avoid expensive rework for your runbook (Ai Coding Agents)

AI Focus 8: Where teams usually get this wrong for production readiness (Ai Coding Agents)

AI Focus 9: The practical decision path for sustained reliability (Ai Coding Agents)

AI Focus 10: How to execute without guesswork for secure delivery (Ai Coding Agents)

AI Focus 11: What to validate before shipping for predictable operations (Ai Coding Agents)

AI Focus 12: Tradeoffs that matter in production for exam and field confidence (Ai Coding Agents)

AI Focus 13: Implementation details that change outcomes for cleaner ownership (Ai Coding Agents)

AI Focus 14: Runtime checks you should not skip for measurable outcomes (Ai Coding Agents)

AI Focus 15: How this maps to real exam objectives for fewer incident surprises (Ai Coding Agents)

AI Focus 16: Failure modes and quick prevention for this workload (Ai Coding Agents)

Recommended baseline (cost-aware and secure)

Related Articles

Building a RAG Pipeline with Gemini 2.5 and Vertex AI Vector Search: 95%+ Answer Accuracy for Under $0.002/Query

Automating GCP Cost Optimization with GenAI + Vertex AI

Prompt Caching in LLMs and Azure AI Foundry - Complete End-to-End Guide

Building Efficient AI Agents: Code Execution with MCP and AWS Bedrock