AI Coding Agents with DeepSeek Latest Model API on AWS and FastAPI
A DevOps team wants an internal AI coding assistant that reviews code, explains errors, and suggests fixes using the latest available DeepSeek API.
AI Coding Agents with DeepSeek Latest Model API on AWS and FastAPI
Scenario
A DevOps team wants an internal AI coding assistant that reviews code, explains errors, and suggests fixes using the latest available DeepSeek API.
Date-sensitive model note
As of May 14, 2026, DeepSeek API documentation lists deepseek-v4-flash and deepseek-v4-pro as primary model IDs. Legacy names deepseek-chat and deepseek-reasoner are marked for deprecation on July 24, 2026. Always verify current model IDs and pricing before rollout:
- https://api-docs.deepseek.com/
- https://api-docs.deepseek.com/quick_start/pricing/
Problem and Business Context
Internal coding assistants can quickly drift into risky behavior:
- leaking code to external systems without policy control
- unbounded token usage and rising cost
- noisy suggestions with no observability
- no rate controls, creating downstream API instability
The objective is a secure internal assistant API with controlled egress, scoped secrets, request quotas, and measurable quality/cost metrics.
AWS Architecture and Trade-offs
Recommended baseline (cost-aware and secure)
- API Gateway HTTP API + JWT auth + WAF rate limit
- Lambda (FastAPI + Mangum) for inference orchestration
- DynamoDB for request cache and usage counters
- Secrets Manager for DeepSeek API key
- CloudWatch for metrics/alerts
This pattern minimizes fixed cost and is easier to operate than always-on clusters for early-stage usage.
Step-by-Step Implementation
1) Environment bootstrap
export AWS_REGION=us-east-1
export PROJECT=deepseek-code-agent
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export FN_NAME=${PROJECT}-api
export USAGE_TABLE=${PROJECT}-usage
$env:AWS_REGION = "us-east-1"
$env:PROJECT = "deepseek-code-agent"
$env:ACCOUNT_ID = (aws sts get-caller-identity --query Account --output text)
$env:FN_NAME = "$($env:PROJECT)-api"
$env:USAGE_TABLE = "$($env:PROJECT)-usage"
2) Create usage table and store DeepSeek secret
aws dynamodb create-table \
--table-name "$USAGE_TABLE" \
--attribute-definitions AttributeName=pk,AttributeType=S AttributeName=ts,AttributeType=S \
--key-schema AttributeName=pk,KeyType=HASH AttributeName=ts,KeyType=RANGE \
--billing-mode PAY_PER_REQUEST \
--sse-specification Enabled=true
aws secretsmanager create-secret \
--name "${PROJECT}/deepseek" \
--description "DeepSeek API key and config" \
--secret-string '{"DEEPSEEK_API_KEY":"REPLACE_ME","DEEPSEEK_MODEL":"deepseek-v4-flash"}'
aws dynamodb create-table `
--table-name $env:USAGE_TABLE `
--attribute-definitions AttributeName=pk,AttributeType=S AttributeName=ts,AttributeType=S `
--key-schema AttributeName=pk,KeyType=HASH AttributeName=ts,KeyType=RANGE `
--billing-mode PAY_PER_REQUEST `
--sse-specification Enabled=true
aws secretsmanager create-secret `
--name "$($env:PROJECT)/deepseek" `
--description "DeepSeek API key and config" `
--secret-string '{"DEEPSEEK_API_KEY":"REPLACE_ME","DEEPSEEK_MODEL":"deepseek-v4-flash"}'
3) Least-privilege IAM policy
policy-deepseek-agent.json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Logs",
"Effect": "Allow",
"Action": ["logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents"],
"Resource": "*"
},
{
"Sid": "ReadSecret",
"Effect": "Allow",
"Action": ["secretsmanager:GetSecretValue"],
"Resource": "arn:aws:secretsmanager:*:*:secret:deepseek-code-agent/deepseek*"
},
{
"Sid": "UsageTable",
"Effect": "Allow",
"Action": ["dynamodb:PutItem", "dynamodb:UpdateItem", "dynamodb:Query"],
"Resource": "arn:aws:dynamodb:*:*:table/deepseek-code-agent-usage"
}
]
}
Attach policy to Lambda execution role.
4) FastAPI application with rate limiting and DeepSeek integration
app/main.py
import json
import os
import time
from datetime import datetime, timezone
import boto3
from fastapi import FastAPI, HTTPException, Header
from mangum import Mangum
from pydantic import BaseModel, Field
from openai import OpenAI
app = FastAPI(title="Internal DeepSeek Coding Agent")
secrets = boto3.client("secretsmanager")
ddb = boto3.resource("dynamodb")
SECRET_NAME = os.environ["SECRET_NAME"]
USAGE_TABLE = os.environ["USAGE_TABLE"]
MAX_PROMPT_CHARS = int(os.environ.get("MAX_PROMPT_CHARS", "12000"))
MAX_REQ_PER_MIN = int(os.environ.get("MAX_REQ_PER_MIN", "30"))
table = ddb.Table(USAGE_TABLE)
class ReviewRequest(BaseModel):
repo: str = Field(min_length=2)
diff: str = Field(min_length=5, max_length=50000)
question: str = Field(min_length=3, max_length=6000)
def _load_secret() -> dict:
payload = secrets.get_secret_value(SecretId=SECRET_NAME)
return json.loads(payload["SecretString"])
def _rate_limit(actor: str) -> None:
minute = datetime.now(timezone.utc).strftime("%Y%m%d%H%M")
key = f"ACTOR#{actor}#MIN#{minute}"
table.put_item(Item={"pk": key, "ts": str(time.time())})
resp = table.query(
KeyConditionExpression="pk = :pk",
ExpressionAttributeValues={":pk": key},
Select="COUNT",
)
if resp.get("Count", 0) > MAX_REQ_PER_MIN:
raise HTTPException(status_code=429, detail="Rate limit exceeded")
@app.get("/health")
def health() -> dict:
return {"ok": True}
@app.post("/review")
def review(req: ReviewRequest, x_employee_id: str = Header(default="unknown")) -> dict:
_rate_limit(x_employee_id)
if len(req.diff) > MAX_PROMPT_CHARS:
raise HTTPException(status_code=400, detail="Diff too large; upload summarized diff")
cfg = _load_secret()
client = OpenAI(api_key=cfg["DEEPSEEK_API_KEY"], base_url="https://api.deepseek.com")
prompt = (
"You are a senior code reviewer. Focus on correctness, security, and operational risk.\n"
f"Repository: {req.repo}\n"
f"Question: {req.question}\n"
f"Diff:\n{req.diff[:MAX_PROMPT_CHARS]}"
)
started = time.time()
completion = client.chat.completions.create(
model=cfg.get("DEEPSEEK_MODEL", "deepseek-v4-flash"),
messages=[{"role": "user", "content": prompt}],
temperature=0.1,
max_tokens=1200,
)
latency_ms = int((time.time() - started) * 1000)
answer = completion.choices[0].message.content
usage = getattr(completion, "usage", None)
table.put_item(Item={
"pk": f"REQ#{x_employee_id}",
"ts": datetime.now(timezone.utc).isoformat(),
"latency_ms": latency_ms,
"prompt_tokens": getattr(usage, "prompt_tokens", 0) if usage else 0,
"completion_tokens": getattr(usage, "completion_tokens", 0) if usage else 0,
})
return {"answer": answer, "latency_ms": latency_ms}
handler = Mangum(app)
requirements.txt
fastapi==0.115.0
mangum==0.17.0
openai==1.51.0
boto3==1.35.0
5) Deploy Lambda and API Gateway
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt -t package
cp -r app package/
cd package && zip -r ../deepseek-agent.zip . && cd ..
aws lambda create-function \
--function-name "$FN_NAME" \
--runtime python3.12 \
--handler app.main.handler \
--role "arn:aws:iam::${ACCOUNT_ID}:role/${PROJECT}-lambda-role" \
--zip-file fileb://deepseek-agent.zip \
--timeout 30 --memory-size 1024 \
--environment "Variables={SECRET_NAME=$PROJECT/deepseek,USAGE_TABLE=$USAGE_TABLE,MAX_PROMPT_CHARS=12000,MAX_REQ_PER_MIN=30}"
API_ID=$(aws apigatewayv2 create-api --name "${PROJECT}-api" --protocol-type HTTP --target "arn:aws:lambda:${AWS_REGION}:${ACCOUNT_ID}:function:${FN_NAME}" --query ApiId --output text)
aws lambda add-permission \
--function-name "$FN_NAME" \
--statement-id apigw-access \
--action lambda:InvokeFunction \
--principal apigateway.amazonaws.com \
--source-arn "arn:aws:execute-api:${AWS_REGION}:${ACCOUNT_ID}:${API_ID}/*/*"
aws apigatewayv2 update-stage \
--api-id "$API_ID" \
--stage-name '$default' \
--default-route-settings ThrottlingBurstLimit=50,ThrottlingRateLimit=25
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt -t package
Copy-Item -Recurse app package
Compress-Archive -Path package\* -DestinationPath deepseek-agent.zip -Force
aws lambda create-function `
--function-name $env:FN_NAME `
--runtime python3.12 `
--handler app.main.handler `
--role "arn:aws:iam::$($env:ACCOUNT_ID):role/$($env:PROJECT)-lambda-role" `
--zip-file fileb://deepseek-agent.zip `
--timeout 30 --memory-size 1024 `
--environment "Variables={SECRET_NAME=$($env:PROJECT)/deepseek,USAGE_TABLE=$($env:USAGE_TABLE),MAX_PROMPT_CHARS=12000,MAX_REQ_PER_MIN=30}"
6) Add protective controls
- WAF rate-based rules for IP abuse.
- JWT auth with employee identity claims.
- Deny oversized inputs at API Gateway and app layer.
- Strip secrets and access tokens from logs.
- Restrict outbound domains at network egress controls where possible.
7) Cost controls that matter
- Default model:
deepseek-v4-flash; escalate todeepseek-v4-proonly for complex cases. - Hard cap tokens (
max_tokens) and input size. - Cache repeated review requests by hash of
(repo, diff, question). - Track per-team token usage in DynamoDB and alarm on anomalies.
- Use AWS Budgets and monthly charge alarms.
Example budget:
aws budgets create-budget \
--account-id "$ACCOUNT_ID" \
--budget '{
"BudgetName":"deepseek-code-agent-monthly",
"BudgetLimit":{"Amount":"500","Unit":"USD"},
"TimeUnit":"MONTHLY",
"BudgetType":"COST"
}'
DeepSeek pricing changes can be frequent. Verify directly before final budgeting.
8) Monitoring
- CloudWatch dashboards:
- requests/min
- p95 latency
- 4xx/5xx rates
- prompt/completion tokens
- Alarms:
- sustained 429s
- Lambda errors
- unexpected token surge
9) Production checklist
- Secret key stored only in Secrets Manager
- Legacy model names removed before 2026-07-24 deadline
- API + app-level throttling enabled
- Prompt size and token caps enforced
- Cost budget alarms active
- Audit records include actor, model, token usage, and latency
- Runbook for DeepSeek API outage and fallback model tested
Final recommendation
For an internal coding assistant, this architecture gives strong security and cost control early. Keep model routing policy-driven, instrument token usage from day one, and treat rate limiting as a first-class reliability control.