How to Cut ECS Fargate Costs Aggressively: The “Crazy but Useful†Playbook
A team running ECS on Fargate wants aggressive cost reduction by combining workload scheduling, architecture cleanup, and capacity strategy without weakening core production paths.
How to Cut ECS Fargate Costs Aggressively: The “Crazy but Useful†Playbook
Scenario
A team running ECS on Fargate wants aggressive cost reduction by combining workload scheduling, architecture cleanup, and capacity strategy without weakening core production paths.
Scope
This guide covers Fargate cost drivers, scale-to-zero approaches, Spot usage boundaries, ARM64 migration, task right-sizing, NAT and IPv4 reduction, ALB consolidation, and logging/image pull optimization.
How to use this guide
Apply changes in phases: start with safe scheduling and right-sizing, then add Spot and networking optimizations, and finally evaluate advanced architectural shifts for always-on dense workloads.
ECS Fargate cost optimization is not one trick. It is a stacking game. You reduce task size, then runtime hours, then architecture cost, then capacity type, then network/logging waste. When stacked together, the savings can feel “exponential†because every layer multiplies the previous one.
The real Fargate bill comes from five places:
Fargate Cost =
requested vCPU
+ requested memory
+ extra ephemeral storage
+ task runtime, including image pull time
+ surrounding services: ALB, NAT, IPv4, CloudWatch Logs, VPC endpoints, data transfer
AWS bills Fargate from the moment the task starts downloading the container image until the task terminates, with per-second billing and a one-minute minimum for Linux containers. Fargate pricing also depends on vCPU, memory, OS, architecture, and configured storage.
1. The brutal truth: your “small†Fargate service may not be the expensive part
A tiny always-on Fargate task can be cheap. The dangerous part is the infrastructure you leave around it: ALB, NAT Gateway, public IPv4 addresses, CloudWatch Logs, and unused dev services.
AWS pricing examples show Linux/x86 in us-east-1 at $0.000011244 per vCPU-second and $0.000001235 per GB-second, while Linux/ARM is lower at $0.0000089944 per vCPU-second and $0.0000009889 per GB-second.
Example: one 0.5 vCPU / 1 GB x86 task running 24/7 for 30 days:
CPU: 0.5 * 0.000011244 * 2,592,000 sec = ~$14.57
Memory: 1.0 * 0.000001235 * 2,592,000 sec = ~$3.20
Total Fargate compute: ~$17.77/month
Three always-on services at that size: ~$53/month before ALB, NAT, logs, IPv4, storage, and data transfer.
Now stack optimizations:
| Move | Cost effect |
|---|---|
Right-size from 0.5 vCPU / 1 GB to 0.25 vCPU / 0.5 GB | ~50% lower |
| Move from x86 to ARM64 | ~20% lower in AWS’s US East example |
| Run dev/pg only 220 hours/month instead of 720 | ~69% lower for those envs |
| Put interruptible dev/batch tasks on Fargate Spot | up to 70% lower |
| Kill NAT/ALB/log waste | sometimes bigger than Fargate itself |
Fargate Spot can run interruption-tolerant ECS tasks at up to 70% off regular Fargate, but AWS can reclaim capacity with a two-minute interruption warning.
2. Hack one: scale dev, staging, preview, and admin environments to zero
This is the highest ROI move for your case.
For production, you probably keep at least one task warm. For dev, pg, preview, test, admin, or Selenium environments, running 24/7 is usually waste. ECS Service Auto Scaling supports a minimum capacity of 0, and AWS explicitly documents scale-to-zero for workloads with no work to do.
For your SmashTheExam-style setup:
prod: keep 1+ task running
dev: scale to 0 outside working windows
pg: scale to 0 unless being tested
selenium/test env: run task only on demand
PowerShell example:
$Cluster = "smashtheexam"
$Service = "smashtheexam-dev-service"
$Region = "us-east-1"
$ResourceId = "service/$Cluster/$Service"
aws application-autoscaling register-scalable-target `
--service-namespace ecs `
--scalable-dimension ecs:service:DesiredCount `
--resource-id $ResourceId `
--min-capacity 0 `
--max-capacity 2 `
--region $Region
# Scale down every day at 23:00 UTC
aws application-autoscaling put-scheduled-action `
--service-namespace ecs `
--scalable-dimension ecs:service:DesiredCount `
--resource-id $ResourceId `
--scheduled-action-name "dev-scale-down-night" `
--schedule "cron(0 23 * * ? *)" `
--scalable-target-action MinCapacity=0,MaxCapacity=0 `
--region $Region
# Scale up on workdays at 07:00 UTC
aws application-autoscaling put-scheduled-action `
--service-namespace ecs `
--scalable-dimension ecs:service:DesiredCount `
--resource-id $ResourceId `
--scheduled-action-name "dev-scale-up-workday" `
--schedule "cron(0 7 ? * MON-FRI *)" `
--scalable-target-action MinCapacity=1,MaxCapacity=2 `
--region $Region
AWS also has a full scheduled-scaling pattern combining ECS scheduled scaling, capacity providers, and Spot to reduce cost.
3. The “crazy people online†scale-to-zero HTTP trick
For HTTP apps behind an ALB, scaling to zero creates a problem: who receives the first request and wakes ECS back up?
People online try patterns like:
User -> ALB -> Lambda "booting page"
|
+-> Lambda updates ECS desired count from 0 to 1
|
+-> user refreshes after task becomes healthy
This is a real pattern discussed in AWS re:Post and Reddit: use Lambda as a temporary ALB target, show “service is loading,†then wake ECS. But the same discussions correctly warn that it adds complexity and may not be worth it compared to one tiny warm container.
My production-grade version:
For prod:
Do NOT do full scale-to-zero unless traffic is very low and cold starts are acceptable.
For dev/preview:
Yes. Put a Lambda/CloudFront landing page in front.
Let it wake ECS using UpdateService.
Auto-shutdown after inactivity.
For internal tools:
Excellent hack. Users can tolerate 1–3 minutes of boot.
4. Hack two: use Fargate Spot with a safe base strategy
Do not put everything on Spot blindly. The smart pattern is:
base = 1 on FARGATE
overflow = FARGATE_SPOT
Example:
aws ecs update-service `
--cluster smashtheexam `
--service smashtheexam-dev-service `
--capacity-provider-strategy `
capacityProvider=FARGATE,base=1,weight=1 `
capacityProvider=FARGATE_SPOT,weight=4 `
--force-new-deployment `
--region us-east-1
Meaning:
Keep the first task stable on normal Fargate.
Send most extra tasks to Spot.
AWS ECS capacity providers allow mixing FARGATE and FARGATE_SPOT, with base and weight controlling placement. At least one provider must have weight greater than zero, and Spot interruptions send a two-minute warning.
Use this for:
| Workload | Spot? |
|---|---|
| Dev environment | Yes |
| Preview env | Yes |
| Selenium/testing jobs | Yes |
| Batch workers | Yes |
| Stateless API extra capacity | Yes, with base on-demand |
| Main production single task | Usually no |
| DB, stateful, migration task | No |
5. Hack three: move x86 images to ARM64 Graviton
This is boring but powerful. AWS’s own Fargate pricing example shows ARM Linux rates lower than x86 Linux in us-east-1: CPU drops from $0.000011244 to $0.0000089944 per vCPU-second, and memory drops from $0.000001235 to $0.0000009889 per GB-second.
Task definition setting:
{
"runtimePlatform": {
"cpuArchitecture": "ARM64",
"operatingSystemFamily": "LINUX"
}
}
Docker build:
docker buildx build `
--platform linux/amd64,linux/arm64 `
-t 324025606669.dkr.ecr.us-east-1.amazonaws.com/smashtheexam-dev/backend:latest `
--push .
Use ARM64 if:
Your Python/FastAPI app has no x86-only native dependency.
Your nginx/frontend image supports ARM.
Your CI can build multi-arch images.
Avoid ARM64 if:
You depend on old binary wheels.
You use native libraries not published for ARM.
You cannot test the image before deployment.
6. Hack four: right-size Fargate like a maniac
Fargate charges for what you request, not what you use. If you request 1 vCPU / 2 GB and your app uses 0.07 vCPU / 200 MB, you are donating money.
AWS recommends choosing Fargate task sizes by summing required reservations and rounding up to the nearest valid Fargate size. AWS Compute Optimizer can also recommend ECS task CPU/memory and container CPU/memory sizes for Fargate services.
Best practical targets:
| Environment | Suggested start |
|---|---|
| Tiny FastAPI backend | 0.25 vCPU / 0.5 GB |
| Angular/nginx frontend | 0.25 vCPU / 0.5 GB |
| Combined nginx + backend dev task | 0.25–0.5 vCPU / 0.5–1 GB |
| Production API with moderate traffic | 0.5 vCPU / 1 GB, then measure |
| Heavy AI/model workload | Fargate may be the wrong platform |
Run this audit:
aws ecs describe-task-definition `
--task-definition smashtheexam-dev `
--region us-east-1 `
--query "taskDefinition.{cpu:cpu,memory:memory,containers:containerDefinitions[].{name:name,cpu:cpu,memory:memory,memoryReservation:memoryReservation}}"
Then compare with CloudWatch:
aws cloudwatch get-metric-statistics `
--namespace AWS/ECS `
--metric-name CPUUtilization `
--dimensions Name=ClusterName,Value=smashtheexam Name=ServiceName,Value=smashtheexam-dev-service `
--statistics Average Maximum `
--period 300 `
--start-time (Get-Date).AddDays(-7).ToUniversalTime().ToString("s") `
--end-time (Get-Date).ToUniversalTime().ToString("s") `
--region us-east-1
7. Hack five: kill NAT Gateway waste
This one is huge.
A NAT Gateway charges per hour and per GB processed. AWS recommends reducing NAT data charges by keeping resources in the same AZ as the NAT Gateway or by using interface/gateway endpoints for AWS services that support them.
The trap:
Private Fargate task -> NAT Gateway -> ECR/S3/Secrets Manager/CloudWatch
You pay NAT hourly cost, NAT data processing, and possibly cross-AZ data transfer.
Better options:
| Pattern | Use when |
|---|---|
| S3 Gateway Endpoint | Your tasks pull or write lots of S3 data |
| ECR interface endpoints | Private tasks pull images from ECR frequently |
| Secrets Manager endpoint | Many task startups fetch secrets |
| No NAT for dev | Dev does not need outbound internet |
| Public subnet + locked SG | Only for low-risk dev, but watch IPv4 charges |
| IPv6-only / dualstack | Advanced, but can reduce IPv4 dependency |
S3 Gateway Endpoints have no additional endpoint charge and allow S3 access from a VPC without an internet gateway or NAT device.
But do the math. Reddit and AWS community discussions repeatedly show people discovering that many interface endpoints can cost more than one NAT Gateway for tiny workloads. The sane rule:
High S3/DynamoDB traffic -> gateway endpoints are obvious.
High ECR/Secrets/CloudWatch startup traffic -> interface endpoints may help.
Tiny dev environment -> sometimes scheduled NAT deletion or no NAT is cheaper.
8. Hack six: stop assigning public IPv4 to every Fargate task
AWS charges for public IPv4 addresses, and Fargate tasks in public subnets need public IPs to pull images unless they have NAT or private ECR endpoints.
Bad pattern:
7 services
7 Fargate tasks
7 public IPv4 addresses
1 ALB
Mostly idle
Better pattern:
ALB has public entry.
Fargate tasks stay private.
Tasks pull ECR through endpoints or controlled NAT.
IPv6-only ECS Fargate exists in supported regions, but it has sharp edges: ECS docs say IPv6-only services need dualstack load balancers with IPv6 target groups, and IPv4-only endpoints require DNS64/NAT64.
For your project, I would not start with IPv6-only for prod. I would first:
1. Remove public IP from private tasks.
2. Share one ALB across services.
3. Add S3 gateway endpoint.
4. Decide NAT vs ECR endpoints using real Cost Explorer data.
5. Later test IPv6-only in dev.
9. Hack seven: share one ALB instead of creating load balancer clones
ALBs look cheap until you leave many idle. AWS pricing examples show an ALB hourly charge plus LCU charge; one AWS example totals $22.42/month for a modest ALB in us-east-1.
AWS ECS docs explicitly mention that internal load balancers add cost, but you can reduce overhead by sharing an ALB across multiple services using path-based routing.
Good pattern:
https://www.smashtheexam.com/ -> frontend target group
https://www.smashtheexam.com/api/* -> backend target group
https://dev.smashtheexam.com/* -> dev target group
https://pg.smashtheexam.com/* -> pg target group
Bad pattern:
prod ALB
dev ALB
pg ALB
admin ALB
selenium ALB
temporary ALB forgotten for 8 months
Online AWS communities regularly complain about idle ALBs becoming zombie costs. One recent Reddit example described forgotten test ALBs sitting idle for months and suggested Lambda-based zombie detection.
10. Hack eight: CloudWatch Logs can silently become your tax
Fargate itself may be optimized, then logs eat the budget.
CloudWatch Logs Infrequent Access has lower ingestion pricing but fewer features; AWS says Standard is for frequently accessed logs and IA is for ad-hoc/forensic logs. After a log group is created, its log class cannot be changed.
Practical setup:
| Log group | Retention | Class |
|---|---|---|
/ecs/prod/backend | 30–90 days | Standard |
/ecs/prod/nginx | 14–30 days | Standard or IA |
/ecs/dev/* | 3–7 days | IA if rarely queried |
/ecs/selenium/* | 1–3 days | IA or export to S3 |
PowerShell:
aws logs put-retention-policy `
--log-group-name "/ecs/smashtheexam-dev/backend" `
--retention-in-days 7 `
--region us-east-1
Also reduce noisy logs:
Do not log every health check.
Do not log full request/response bodies.
Do not log bot traffic at INFO.
Sample high-volume access logs.
Push detailed traces only when debugging.
11. Hack nine: shrink images because Fargate bills while pulling
Fargate does not cache container image layers on the underlying single-use host. AWS says the whole image must be pulled for each Fargate task, and image pull time directly affects task startup time.
This matters for cost because Fargate billing starts when image download starts. Smaller/faster images reduce cold-start cost and improve autoscaling.
Do this:
# Bad
FROM python:3.12
COPY . .
RUN pip install -r requirements.txt
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0"]
# Better
FROM python:3.12-slim-bookworm AS runtime
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app ./app
USER 10001
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
For large images over ~250 MB, AWS recommends considering SOCI lazy loading. Fargate can use SOCI indexes to start containers without waiting for the full image to download, and AWS docs say SOCI is supported on Linux Fargate platform 1.4.0+ for x86_64 and ARM64.
12. Hack ten: stop splitting tiny things into too many Fargate services
Fargate has task-level CPU and memory billing. If you split three tiny containers into three services, each one gets its own minimum task shape. If they are tightly coupled and scale together, combine them.
AWS says containers in the same Fargate task share task resources and always run on the same host.
Good combination:
nginx reverse proxy + app container
app + lightweight sidecar
worker + tiny helper
Bad combination:
frontend + backend + worker + admin all in one task
Rule:
Combine only when lifecycle and scaling are identical.
Split when scaling, security, deployment, or failure domains differ.
13. Hack eleven: use RunTask for jobs, not always-on services
Many people deploy background jobs as ECS services with one task always waiting. That is waste.
Better:
EventBridge schedule -> ECS RunTask
SQS message depth -> scale workers
GitHub/GitLab deploy -> temporary migration task
Manual admin action -> one-off ECS task
AWS’s own ECS cost checklist recommends scheduled/task-based patterns for batch workloads so tasks run only when needed instead of sitting idle.
For example, a sitemap generator, article builder, Selenium crawler, or cleanup job should not be a permanent service.
14. Hack twelve: when Fargate is no longer economical, switch only the hot path to ECS on EC2 Spot
This is the “crazy but mature†move.
Fargate is excellent when:
You want zero server management.
You have variable workloads.
You care more about operational simplicity than perfect bin-packing.
ECS on EC2 becomes attractive when:
You run many always-on containers.
You can bin-pack them tightly.
You can tolerate managing AMIs, patching, scaling, and draining.
You can use Graviton Spot instances.
AWS ECS capacity providers with EC2 Spot can optimize both cost and scale; AWS has an official pattern using capacity providers, managed scaling, and Spot interruption handling.
Best hybrid model:
Production baseline: ECS on EC2 Graviton Spot/Reserved or small On-Demand
Burst capacity: Fargate Spot
Critical fallback: Fargate On-Demand
Dev/test: Fargate scale-to-zero
Priority plan for SmashTheExam-style ECS Fargate cost reduction
Phase 1: immediate safe wins
| Action | Risk | Expected impact |
|---|---|---|
| Scale dev/pg to zero outside usage | Low | Very high |
| Set dev log retention to 7 days | Low | Medium |
| Audit public IPv4 on tasks | Low | Medium |
| Share one ALB for prod/dev/pg if possible | Medium | High |
| Right-size task CPU/RAM | Medium | High |
Phase 2: aggressive but clean
| Action | Risk | Expected impact |
|---|---|---|
| Move images to ARM64 | Medium | ~20% compute reduction |
| Add Fargate Spot for dev/test/workers | Medium | Up to 70% for those tasks |
| Add S3 Gateway Endpoint | Low | High if S3 traffic exists |
| Kill NAT for idle dev | Medium | High |
| Convert scheduled jobs to RunTask | Low | High |
Phase 3: crazy advanced mode
| Action | Risk | Expected impact |
|---|---|---|
| Lambda wake page for scale-to-zero HTTP dev env | Medium/high | Very high |
| IPv6-only ECS dev experiment | High | Medium |
| ECS on EC2 Spot for dense always-on workloads | High | Very high at scale |
| SOCI lazy loading for large images | Medium | Startup/cold-scale savings |
My recommended final architecture
CloudFront
|
+--------+--------+
| |
Static Angular ALB / API
S3 or cached shared ALB
|
+-------------------+------------------+
| | |
prod API dev API pg API
Fargate ARM64 Fargate ARM64 Fargate ARM64
min 1 task scheduled 0/1 scheduled 0/1
on-demand base Spot allowed Spot allowed
|
workers/jobs
EventBridge/SQS -> ECS RunTask on Fargate Spot
The killer formula:
Keep prod stable.
Make everything else disappear when unused.
Use ARM64 everywhere.
Use Spot only where interruption is acceptable.
Remove NAT/ALB/log/public-IP waste.
For your case, the biggest immediate win is probably not a tiny CPU tweak. It is making dev and pg behave like disposable environments: wake when needed, sleep when idle, and never keep their own expensive networking stack alive for no reason.
References
- AWS Fargate Pricing
- Automatically scale your Amazon ECS service - Amazon Elastic Container Service
- Optimizing Amazon Elastic Container Service for cost using scheduled scaling | Containers
- ALB ECS scale tasks to zero and scale up via lambda | AWS re:Post
- Amazon ECS clusters for Fargate - Amazon Elastic Container Service
- Choosing Fargate task sizes for Amazon ECS - Amazon Elastic Container Service
- Pricing for NAT gateways - Amazon Virtual Private Cloud
- Gateway endpoints for Amazon S3 - Amazon Virtual Private Cloud
- IP addressing for your VPCs and subnets - Amazon Virtual Private Cloud
- Amazon ECS task networking options for Fargate - Amazon Elastic Container Service
- Elastic Load Balancing pricing
- Best practices for connecting Amazon ECS services in a VPC - Amazon Elastic Container Service
- ALBs look cheap until you forget 50 of them running idle
- Log classes - Amazon CloudWatch Logs
- Linux containers on Fargate container image pull behavior for Amazon ECS - Amazon Elastic Container Service
- Fargate security considerations for Amazon ECS - Amazon Elastic Container Service
- Cost Optimization Checklist for Amazon ECS and AWS Fargate | Containers
- Optimize cost for container workloads with ECS capacity providers and EC2 Spot Instances | Containers