AI Coding Agents with DeepSeek Latest Model API on AWS and FastAPI
AI Coding Agents with DeepSeek Latest Model API on AWS and FastAPI turns the concept into a usable execution plan with concrete checks and production-minded guardrails.
AI Coding Agents with DeepSeek Latest Model API on AWS and FastAPI
AI Focus 1: Risk controls worth enforcing early for predictable operations (Ai Coding Agents)
A DevOps team wants an internal AI coding assistant that reviews code, explains errors, and suggests fixes using the latest available DeepSeek API.
Editorial review note for Ai Coding Agents
This section was reviewed by a human editor to keep the recommendations actionable and technically grounded. Reviewed by: Med Amine Mahmoud. Last editorial review: 2026-05-26T16:10:01Z.
AI Focus 3: How to keep cost and reliability aligned for cleaner ownership (Ai Coding Agents)
AI Focus 4: What to document for your team for measurable outcomes (Ai Coding Agents)
export AWS_REGION=us-east-1
export PROJECT=deepseek-code-agent
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export FN_NAME=${PROJECT}-api
export USAGE_TABLE=${PROJECT}-usage
$env:AWS_REGION = "us-east-1"
$env:PROJECT = "deepseek-code-agent"
$env:ACCOUNT_ID = (aws sts get-caller-identity --query Account --output text)
$env:FN_NAME = "$($env:PROJECT)-api"
$env:USAGE_TABLE = "$($env:PROJECT)-usage"
AI Focus 5: Where this architecture earns its value for fewer incident surprises (Ai Coding Agents)
aws dynamodb create-table \
--table-name "$USAGE_TABLE" \
--attribute-definitions AttributeName=pk,AttributeType=S AttributeName=ts,AttributeType=S \
--key-schema AttributeName=pk,KeyType=HASH AttributeName=ts,KeyType=RANGE \
--billing-mode PAY_PER_REQUEST \
--sse-specification Enabled=true
aws secretsmanager create-secret \
--name "${PROJECT}/deepseek" \
--description "DeepSeek API key and config" \
--secret-string '{"DEEPSEEK_API_KEY":"REPLACE_ME","DEEPSEEK_MODEL":"deepseek-v4-flash"}'
aws dynamodb create-table `
--table-name $env:USAGE_TABLE `
--attribute-definitions AttributeName=pk,AttributeType=S AttributeName=ts,AttributeType=S `
--key-schema AttributeName=pk,KeyType=HASH AttributeName=ts,KeyType=RANGE `
--billing-mode PAY_PER_REQUEST `
--sse-specification Enabled=true
aws secretsmanager create-secret `
--name "$($env:PROJECT)/deepseek" `
--description "DeepSeek API key and config" `
--secret-string '{"DEEPSEEK_API_KEY":"REPLACE_ME","DEEPSEEK_MODEL":"deepseek-v4-flash"}'
AI Focus 6: Operational notes from real-world usage for this workload (Ai Coding Agents)
policy-deepseek-agent.json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Logs",
"Effect": "Allow",
"Action": ["logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents"],
"Resource": "*"
},
{
"Sid": "ReadSecret",
"Effect": "Allow",
"Action": ["secretsmanager:GetSecretValue"],
"Resource": "arn:aws:secretsmanager:*:*:secret:deepseek-code-agent/deepseek*"
},
{
"Sid": "UsageTable",
"Effect": "Allow",
"Action": ["dynamodb:PutItem", "dynamodb:UpdateItem", "dynamodb:Query"],
"Resource": "arn:aws:dynamodb:*:*:table/deepseek-code-agent-usage"
}
]
}
Attach policy to Lambda execution role.
AI Focus 7: How to avoid expensive rework for your runbook (Ai Coding Agents)
app/main.py
import json
import os
import time
from datetime import datetime, timezone
import boto3
from fastapi import FastAPI, HTTPException, Header
from mangum import Mangum
from pydantic import BaseModel, Field
from openai import OpenAI
app = FastAPI(title="Internal DeepSeek Coding Agent")
secrets = boto3.client("secretsmanager")
ddb = boto3.resource("dynamodb")
SECRET_NAME = os.environ["SECRET_NAME"]
USAGE_TABLE = os.environ["USAGE_TABLE"]
MAX_PROMPT_CHARS = int(os.environ.get("MAX_PROMPT_CHARS", "12000"))
MAX_REQ_PER_MIN = int(os.environ.get("MAX_REQ_PER_MIN", "30"))
table = ddb.Table(USAGE_TABLE)
class ReviewRequest(BaseModel):
repo: str = Field(min_length=2)
diff: str = Field(min_length=5, max_length=50000)
question: str = Field(min_length=3, max_length=6000)
def _load_secret() -> dict:
payload = secrets.get_secret_value(SecretId=SECRET_NAME)
return json.loads(payload["SecretString"])
def _rate_limit(actor: str) -> None:
minute = datetime.now(timezone.utc).strftime("%Y%m%d%H%M")
key = f"ACTOR#{actor}#MIN#{minute}"
table.put_item(Item={"pk": key, "ts": str(time.time())})
resp = table.query(
KeyConditionExpression="pk = :pk",
ExpressionAttributeValues={":pk": key},
Select="COUNT",
)
if resp.get("Count", 0) > MAX_REQ_PER_MIN:
raise HTTPException(status_code=429, detail="Rate limit exceeded")
@app.get("/health")
def health() -> dict:
return {"ok": True}
@app.post("/review")
def review(req: ReviewRequest, x_employee_id: str = Header(default="unknown")) -> dict:
_rate_limit(x_employee_id)
if len(req.diff) > MAX_PROMPT_CHARS:
raise HTTPException(status_code=400, detail="Diff too large; upload summarized diff")
cfg = _load_secret()
client = OpenAI(api_key=cfg["DEEPSEEK_API_KEY"], base_url="https://api.deepseek.com")
prompt = (
"You are a senior code reviewer. Focus on correctness, security, and operational risk.\n"
f"Repository: {req.repo}\n"
f"Question: {req.question}\n"
f"Diff:\n{req.diff[:MAX_PROMPT_CHARS]}"
)
started = time.time()
completion = client.chat.completions.create(
model=cfg.get("DEEPSEEK_MODEL", "deepseek-v4-flash"),
messages=[{"role": "user", "content": prompt}],
temperature=0.1,
max_tokens=1200,
)
latency_ms = int((time.time() - started) * 1000)
answer = completion.choices[0].message.content
usage = getattr(completion, "usage", None)
table.put_item(Item={
"pk": f"REQ#{x_employee_id}",
"ts": datetime.now(timezone.utc).isoformat(),
"latency_ms": latency_ms,
"prompt_tokens": getattr(usage, "prompt_tokens", 0) if usage else 0,
"completion_tokens": getattr(usage, "completion_tokens", 0) if usage else 0,
})
return {"answer": answer, "latency_ms": latency_ms}
handler = Mangum(app)
requirements.txt
fastapi==0.115.0
mangum==0.17.0
openai==1.51.0
boto3==1.35.0
AI Focus 8: Where teams usually get this wrong for production readiness (Ai Coding Agents)
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt -t package
cp -r app package/
cd package && zip -r ../deepseek-agent.zip . && cd ..
aws lambda create-function \
--function-name "$FN_NAME" \
--runtime python3.12 \
--handler app.main.handler \
--role "arn:aws:iam::${ACCOUNT_ID}:role/${PROJECT}-lambda-role" \
--zip-file fileb://deepseek-agent.zip \
--timeout 30 --memory-size 1024 \
--environment "Variables={SECRET_NAME=$PROJECT/deepseek,USAGE_TABLE=$USAGE_TABLE,MAX_PROMPT_CHARS=12000,MAX_REQ_PER_MIN=30}"
API_ID=$(aws apigatewayv2 create-api --name "${PROJECT}-api" --protocol-type HTTP --target "arn:aws:lambda:${AWS_REGION}:${ACCOUNT_ID}:function:${FN_NAME}" --query ApiId --output text)
aws lambda add-permission \
--function-name "$FN_NAME" \
--statement-id apigw-access \
--action lambda:InvokeFunction \
--principal apigateway.amazonaws.com \
--source-arn "arn:aws:execute-api:${AWS_REGION}:${ACCOUNT_ID}:${API_ID}/*/*"
aws apigatewayv2 update-stage \
--api-id "$API_ID" \
--stage-name '$default' \
--default-route-settings ThrottlingBurstLimit=50,ThrottlingRateLimit=25
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt -t package
Copy-Item -Recurse app package
Compress-Archive -Path package\* -DestinationPath deepseek-agent.zip -Force
aws lambda create-function `
--function-name $env:FN_NAME `
--runtime python3.12 `
--handler app.main.handler `
--role "arn:aws:iam::$($env:ACCOUNT_ID):role/$($env:PROJECT)-lambda-role" `
--zip-file fileb://deepseek-agent.zip `
--timeout 30 --memory-size 1024 `
--environment "Variables={SECRET_NAME=$($env:PROJECT)/deepseek,USAGE_TABLE=$($env:USAGE_TABLE),MAX_PROMPT_CHARS=12000,MAX_REQ_PER_MIN=30}"
AI Focus 9: The practical decision path for sustained reliability (Ai Coding Agents)
- WAF rate-based rules for IP abuse.
- JWT auth with employee identity claims.
- Deny oversized inputs at API Gateway and app layer.
- Strip secrets and access tokens from logs.
- Restrict outbound domains at network egress controls where possible.
AI Focus 10: How to execute without guesswork for secure delivery (Ai Coding Agents)
- Default model:
deepseek-v4-flash; escalate todeepseek-v4-proonly for complex cases. - Hard cap tokens (
max_tokens) and input size. - Cache repeated review requests by hash of
(repo, diff, question). - Track per-team token usage in DynamoDB and alarm on anomalies.
- Use AWS Budgets and monthly charge alarms.
Example budget:
aws budgets create-budget \
--account-id "$ACCOUNT_ID" \
--budget '{
"BudgetName":"deepseek-code-agent-monthly",
"BudgetLimit":{"Amount":"500","Unit":"USD"},
"TimeUnit":"MONTHLY",
"BudgetType":"COST"
}'
DeepSeek pricing changes can be frequent. Verify directly before final budgeting.
AI Focus 11: What to validate before shipping for predictable operations (Ai Coding Agents)
- CloudWatch dashboards:
- requests/min
- p95 latency
- 4xx/5xx rates
- prompt/completion tokens
- Alarms:
- sustained 429s
- Lambda errors
- unexpected token surge
AI Focus 12: Tradeoffs that matter in production for exam and field confidence (Ai Coding Agents)
- Secret key stored only in Secrets Manager
- Legacy model names removed before 2026-07-24 deadline
- API + app-level throttling enabled
- Prompt size and token caps enforced
- Cost budget alarms active
- Audit records include actor, model, token usage, and latency
- Runbook for DeepSeek API outage and fallback model tested
AI Focus 13: Implementation details that change outcomes for cleaner ownership (Ai Coding Agents)
For an internal coding assistant, this architecture gives strong security and cost control early. Keep model routing policy-driven, instrument token usage from day one, and treat rate limiting as a first-class reliability control.
AI Focus 14: Runtime checks you should not skip for measurable outcomes (Ai Coding Agents)
As of May 14, 2026, DeepSeek API documentation lists deepseek-v4-flash and deepseek-v4-pro as primary model IDs. Legacy names deepseek-chat and deepseek-reasoner are marked for deprecation on July 24, 2026. Always verify current model IDs and pricing before rollout:
- https://api-docs.deepseek.com/
- https://api-docs.deepseek.com/quick_start/pricing/
AI Focus 15: How this maps to real exam objectives for fewer incident surprises (Ai Coding Agents)
Internal coding assistants can quickly drift into risky behavior:
- leaking code to external systems without policy control
- unbounded token usage and rising cost
- noisy suggestions with no observability
- no rate controls, creating downstream API instability
The objective is a secure internal assistant API with controlled egress, scoped secrets, request quotas, and measurable quality/cost metrics.
AI Focus 16: Failure modes and quick prevention for this workload (Ai Coding Agents)
Recommended baseline (cost-aware and secure)
- API Gateway HTTP API + JWT auth + WAF rate limit
- Lambda (FastAPI + Mangum) for inference orchestration
- DynamoDB for request cache and usage counters
- Secrets Manager for DeepSeek API key
- CloudWatch for metrics/alerts
This pattern minimizes fixed cost and is easier to operate than always-on clusters for early-stage usage.
