← Blog/How to Cut ECS Fargate Costs Aggressively: The “Crazy but Useful” …
FinOps

How to Cut ECS Fargate Costs Aggressively: The “Crazy but Useful” Playbook

May 20, 2026·15 min read

A team running ECS on Fargate wants aggressive cost reduction by combining workload scheduling, architecture cleanup, and capacity strategy without weakening core production paths.

Cost Optimization

How to Cut ECS Fargate Costs Aggressively: The “Crazy but Useful” Playbook

Scenario

A team running ECS on Fargate wants aggressive cost reduction by combining workload scheduling, architecture cleanup, and capacity strategy without weakening core production paths.

Scope

This guide covers Fargate cost drivers, scale-to-zero approaches, Spot usage boundaries, ARM64 migration, task right-sizing, NAT and IPv4 reduction, ALB consolidation, and logging/image pull optimization.

How to use this guide

Apply changes in phases: start with safe scheduling and right-sizing, then add Spot and networking optimizations, and finally evaluate advanced architectural shifts for always-on dense workloads.


ECS Fargate cost optimization is not one trick. It is a stacking game. You reduce task size, then runtime hours, then architecture cost, then capacity type, then network/logging waste. When stacked together, the savings can feel “exponential” because every layer multiplies the previous one.

The real Fargate bill comes from five places:

Fargate Cost =
  requested vCPU
+ requested memory
+ extra ephemeral storage
+ task runtime, including image pull time
+ surrounding services: ALB, NAT, IPv4, CloudWatch Logs, VPC endpoints, data transfer

AWS bills Fargate from the moment the task starts downloading the container image until the task terminates, with per-second billing and a one-minute minimum for Linux containers. Fargate pricing also depends on vCPU, memory, OS, architecture, and configured storage.


1. The brutal truth: your “small” Fargate service may not be the expensive part

A tiny always-on Fargate task can be cheap. The dangerous part is the infrastructure you leave around it: ALB, NAT Gateway, public IPv4 addresses, CloudWatch Logs, and unused dev services.

AWS pricing examples show Linux/x86 in us-east-1 at $0.000011244 per vCPU-second and $0.000001235 per GB-second, while Linux/ARM is lower at $0.0000089944 per vCPU-second and $0.0000009889 per GB-second.

Example: one 0.5 vCPU / 1 GB x86 task running 24/7 for 30 days:

CPU:    0.5 * 0.000011244 * 2,592,000 sec = ~$14.57
Memory: 1.0 * 0.000001235 * 2,592,000 sec = ~$3.20

Total Fargate compute: ~$17.77/month

Three always-on services at that size: ~$53/month before ALB, NAT, logs, IPv4, storage, and data transfer.

Now stack optimizations:

MoveCost effect
Right-size from 0.5 vCPU / 1 GB to 0.25 vCPU / 0.5 GB~50% lower
Move from x86 to ARM64~20% lower in AWS’s US East example
Run dev/pg only 220 hours/month instead of 720~69% lower for those envs
Put interruptible dev/batch tasks on Fargate Spotup to 70% lower
Kill NAT/ALB/log wastesometimes bigger than Fargate itself

Fargate Spot can run interruption-tolerant ECS tasks at up to 70% off regular Fargate, but AWS can reclaim capacity with a two-minute interruption warning.


2. Hack one: scale dev, staging, preview, and admin environments to zero

This is the highest ROI move for your case.

For production, you probably keep at least one task warm. For dev, pg, preview, test, admin, or Selenium environments, running 24/7 is usually waste. ECS Service Auto Scaling supports a minimum capacity of 0, and AWS explicitly documents scale-to-zero for workloads with no work to do.

For your SmashTheExam-style setup:

prod: keep 1+ task running
dev: scale to 0 outside working windows
pg: scale to 0 unless being tested
selenium/test env: run task only on demand

PowerShell example:

$Cluster = "smashtheexam"
$Service = "smashtheexam-dev-service"
$Region  = "us-east-1"
$ResourceId = "service/$Cluster/$Service"

aws application-autoscaling register-scalable-target `
  --service-namespace ecs `
  --scalable-dimension ecs:service:DesiredCount `
  --resource-id $ResourceId `
  --min-capacity 0 `
  --max-capacity 2 `
  --region $Region

# Scale down every day at 23:00 UTC


aws application-autoscaling put-scheduled-action `
  --service-namespace ecs `
  --scalable-dimension ecs:service:DesiredCount `
  --resource-id $ResourceId `
  --scheduled-action-name "dev-scale-down-night" `
  --schedule "cron(0 23 * * ? *)" `
  --scalable-target-action MinCapacity=0,MaxCapacity=0 `
  --region $Region

# Scale up on workdays at 07:00 UTC


aws application-autoscaling put-scheduled-action `
  --service-namespace ecs `
  --scalable-dimension ecs:service:DesiredCount `
  --resource-id $ResourceId `
  --scheduled-action-name "dev-scale-up-workday" `
  --schedule "cron(0 7 ? * MON-FRI *)" `
  --scalable-target-action MinCapacity=1,MaxCapacity=2 `
  --region $Region

AWS also has a full scheduled-scaling pattern combining ECS scheduled scaling, capacity providers, and Spot to reduce cost.


3. The “crazy people online” scale-to-zero HTTP trick

For HTTP apps behind an ALB, scaling to zero creates a problem: who receives the first request and wakes ECS back up?

People online try patterns like:

User -> ALB -> Lambda "booting page"
              |
              +-> Lambda updates ECS desired count from 0 to 1
              |
              +-> user refreshes after task becomes healthy

This is a real pattern discussed in AWS re:Post and Reddit: use Lambda as a temporary ALB target, show “service is loading,” then wake ECS. But the same discussions correctly warn that it adds complexity and may not be worth it compared to one tiny warm container.

My production-grade version:

For prod:
  Do NOT do full scale-to-zero unless traffic is very low and cold starts are acceptable.

For dev/preview:
  Yes. Put a Lambda/CloudFront landing page in front.
  Let it wake ECS using UpdateService.
  Auto-shutdown after inactivity.

For internal tools:
  Excellent hack. Users can tolerate 1–3 minutes of boot.

4. Hack two: use Fargate Spot with a safe base strategy

Do not put everything on Spot blindly. The smart pattern is:

base = 1 on FARGATE
overflow = FARGATE_SPOT

Example:

aws ecs update-service `
  --cluster smashtheexam `
  --service smashtheexam-dev-service `
  --capacity-provider-strategy `
    capacityProvider=FARGATE,base=1,weight=1 `
    capacityProvider=FARGATE_SPOT,weight=4 `
  --force-new-deployment `
  --region us-east-1

Meaning:

Keep the first task stable on normal Fargate.
Send most extra tasks to Spot.

AWS ECS capacity providers allow mixing FARGATE and FARGATE_SPOT, with base and weight controlling placement. At least one provider must have weight greater than zero, and Spot interruptions send a two-minute warning.

Use this for:

WorkloadSpot?
Dev environmentYes
Preview envYes
Selenium/testing jobsYes
Batch workersYes
Stateless API extra capacityYes, with base on-demand
Main production single taskUsually no
DB, stateful, migration taskNo

5. Hack three: move x86 images to ARM64 Graviton

This is boring but powerful. AWS’s own Fargate pricing example shows ARM Linux rates lower than x86 Linux in us-east-1: CPU drops from $0.000011244 to $0.0000089944 per vCPU-second, and memory drops from $0.000001235 to $0.0000009889 per GB-second.

Task definition setting:

{
  "runtimePlatform": {
    "cpuArchitecture": "ARM64",
    "operatingSystemFamily": "LINUX"
  }
}

Docker build:

docker buildx build `
  --platform linux/amd64,linux/arm64 `
  -t 324025606669.dkr.ecr.us-east-1.amazonaws.com/smashtheexam-dev/backend:latest `
  --push .

Use ARM64 if:

Your Python/FastAPI app has no x86-only native dependency.
Your nginx/frontend image supports ARM.
Your CI can build multi-arch images.

Avoid ARM64 if:

You depend on old binary wheels.
You use native libraries not published for ARM.
You cannot test the image before deployment.

6. Hack four: right-size Fargate like a maniac

Fargate charges for what you request, not what you use. If you request 1 vCPU / 2 GB and your app uses 0.07 vCPU / 200 MB, you are donating money.

AWS recommends choosing Fargate task sizes by summing required reservations and rounding up to the nearest valid Fargate size. AWS Compute Optimizer can also recommend ECS task CPU/memory and container CPU/memory sizes for Fargate services.

Best practical targets:

EnvironmentSuggested start
Tiny FastAPI backend0.25 vCPU / 0.5 GB
Angular/nginx frontend0.25 vCPU / 0.5 GB
Combined nginx + backend dev task0.25–0.5 vCPU / 0.5–1 GB
Production API with moderate traffic0.5 vCPU / 1 GB, then measure
Heavy AI/model workloadFargate may be the wrong platform

Run this audit:

aws ecs describe-task-definition `
  --task-definition smashtheexam-dev `
  --region us-east-1 `
  --query "taskDefinition.{cpu:cpu,memory:memory,containers:containerDefinitions[].{name:name,cpu:cpu,memory:memory,memoryReservation:memoryReservation}}"

Then compare with CloudWatch:

aws cloudwatch get-metric-statistics `
  --namespace AWS/ECS `
  --metric-name CPUUtilization `
  --dimensions Name=ClusterName,Value=smashtheexam Name=ServiceName,Value=smashtheexam-dev-service `
  --statistics Average Maximum `
  --period 300 `
  --start-time (Get-Date).AddDays(-7).ToUniversalTime().ToString("s") `
  --end-time (Get-Date).ToUniversalTime().ToString("s") `
  --region us-east-1

7. Hack five: kill NAT Gateway waste

This one is huge.

A NAT Gateway charges per hour and per GB processed. AWS recommends reducing NAT data charges by keeping resources in the same AZ as the NAT Gateway or by using interface/gateway endpoints for AWS services that support them.

The trap:

Private Fargate task -> NAT Gateway -> ECR/S3/Secrets Manager/CloudWatch

You pay NAT hourly cost, NAT data processing, and possibly cross-AZ data transfer.

Better options:

PatternUse when
S3 Gateway EndpointYour tasks pull or write lots of S3 data
ECR interface endpointsPrivate tasks pull images from ECR frequently
Secrets Manager endpointMany task startups fetch secrets
No NAT for devDev does not need outbound internet
Public subnet + locked SGOnly for low-risk dev, but watch IPv4 charges
IPv6-only / dualstackAdvanced, but can reduce IPv4 dependency

S3 Gateway Endpoints have no additional endpoint charge and allow S3 access from a VPC without an internet gateway or NAT device.

But do the math. Reddit and AWS community discussions repeatedly show people discovering that many interface endpoints can cost more than one NAT Gateway for tiny workloads. The sane rule:

High S3/DynamoDB traffic -> gateway endpoints are obvious.
High ECR/Secrets/CloudWatch startup traffic -> interface endpoints may help.
Tiny dev environment -> sometimes scheduled NAT deletion or no NAT is cheaper.

8. Hack six: stop assigning public IPv4 to every Fargate task

AWS charges for public IPv4 addresses, and Fargate tasks in public subnets need public IPs to pull images unless they have NAT or private ECR endpoints.

Bad pattern:

7 services
7 Fargate tasks
7 public IPv4 addresses
1 ALB
Mostly idle

Better pattern:

ALB has public entry.
Fargate tasks stay private.
Tasks pull ECR through endpoints or controlled NAT.

IPv6-only ECS Fargate exists in supported regions, but it has sharp edges: ECS docs say IPv6-only services need dualstack load balancers with IPv6 target groups, and IPv4-only endpoints require DNS64/NAT64.

For your project, I would not start with IPv6-only for prod. I would first:

1. Remove public IP from private tasks.
2. Share one ALB across services.
3. Add S3 gateway endpoint.
4. Decide NAT vs ECR endpoints using real Cost Explorer data.
5. Later test IPv6-only in dev.

9. Hack seven: share one ALB instead of creating load balancer clones

ALBs look cheap until you leave many idle. AWS pricing examples show an ALB hourly charge plus LCU charge; one AWS example totals $22.42/month for a modest ALB in us-east-1.

AWS ECS docs explicitly mention that internal load balancers add cost, but you can reduce overhead by sharing an ALB across multiple services using path-based routing.

Good pattern:

https://www.smashtheexam.com/        -> frontend target group
https://www.smashtheexam.com/api/*   -> backend target group
https://dev.smashtheexam.com/*       -> dev target group
https://pg.smashtheexam.com/*        -> pg target group

Bad pattern:

prod ALB
dev ALB
pg ALB
admin ALB
selenium ALB
temporary ALB forgotten for 8 months

Online AWS communities regularly complain about idle ALBs becoming zombie costs. One recent Reddit example described forgotten test ALBs sitting idle for months and suggested Lambda-based zombie detection.


10. Hack eight: CloudWatch Logs can silently become your tax

Fargate itself may be optimized, then logs eat the budget.

CloudWatch Logs Infrequent Access has lower ingestion pricing but fewer features; AWS says Standard is for frequently accessed logs and IA is for ad-hoc/forensic logs. After a log group is created, its log class cannot be changed.

Practical setup:

Log groupRetentionClass
/ecs/prod/backend30–90 daysStandard
/ecs/prod/nginx14–30 daysStandard or IA
/ecs/dev/*3–7 daysIA if rarely queried
/ecs/selenium/*1–3 daysIA or export to S3

PowerShell:

aws logs put-retention-policy `
  --log-group-name "/ecs/smashtheexam-dev/backend" `
  --retention-in-days 7 `
  --region us-east-1

Also reduce noisy logs:

Do not log every health check.
Do not log full request/response bodies.
Do not log bot traffic at INFO.
Sample high-volume access logs.
Push detailed traces only when debugging.

11. Hack nine: shrink images because Fargate bills while pulling

Fargate does not cache container image layers on the underlying single-use host. AWS says the whole image must be pulled for each Fargate task, and image pull time directly affects task startup time.

This matters for cost because Fargate billing starts when image download starts. Smaller/faster images reduce cold-start cost and improve autoscaling.

Do this:

# Bad


FROM python:3.12
COPY . .
RUN pip install -r requirements.txt
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0"]

# Better


FROM python:3.12-slim-bookworm AS runtime

ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY app ./app

USER 10001

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

For large images over ~250 MB, AWS recommends considering SOCI lazy loading. Fargate can use SOCI indexes to start containers without waiting for the full image to download, and AWS docs say SOCI is supported on Linux Fargate platform 1.4.0+ for x86_64 and ARM64.


12. Hack ten: stop splitting tiny things into too many Fargate services

Fargate has task-level CPU and memory billing. If you split three tiny containers into three services, each one gets its own minimum task shape. If they are tightly coupled and scale together, combine them.

AWS says containers in the same Fargate task share task resources and always run on the same host.

Good combination:

nginx reverse proxy + app container
app + lightweight sidecar
worker + tiny helper

Bad combination:

frontend + backend + worker + admin all in one task

Rule:

Combine only when lifecycle and scaling are identical.
Split when scaling, security, deployment, or failure domains differ.

13. Hack eleven: use RunTask for jobs, not always-on services

Many people deploy background jobs as ECS services with one task always waiting. That is waste.

Better:

EventBridge schedule -> ECS RunTask
SQS message depth -> scale workers
GitHub/GitLab deploy -> temporary migration task
Manual admin action -> one-off ECS task

AWS’s own ECS cost checklist recommends scheduled/task-based patterns for batch workloads so tasks run only when needed instead of sitting idle.

For example, a sitemap generator, article builder, Selenium crawler, or cleanup job should not be a permanent service.


14. Hack twelve: when Fargate is no longer economical, switch only the hot path to ECS on EC2 Spot

This is the “crazy but mature” move.

Fargate is excellent when:

You want zero server management.
You have variable workloads.
You care more about operational simplicity than perfect bin-packing.

ECS on EC2 becomes attractive when:

You run many always-on containers.
You can bin-pack them tightly.
You can tolerate managing AMIs, patching, scaling, and draining.
You can use Graviton Spot instances.

AWS ECS capacity providers with EC2 Spot can optimize both cost and scale; AWS has an official pattern using capacity providers, managed scaling, and Spot interruption handling.

Best hybrid model:

Production baseline: ECS on EC2 Graviton Spot/Reserved or small On-Demand
Burst capacity: Fargate Spot
Critical fallback: Fargate On-Demand
Dev/test: Fargate scale-to-zero

Priority plan for SmashTheExam-style ECS Fargate cost reduction

Phase 1: immediate safe wins

ActionRiskExpected impact
Scale dev/pg to zero outside usageLowVery high
Set dev log retention to 7 daysLowMedium
Audit public IPv4 on tasksLowMedium
Share one ALB for prod/dev/pg if possibleMediumHigh
Right-size task CPU/RAMMediumHigh

Phase 2: aggressive but clean

ActionRiskExpected impact
Move images to ARM64Medium~20% compute reduction
Add Fargate Spot for dev/test/workersMediumUp to 70% for those tasks
Add S3 Gateway EndpointLowHigh if S3 traffic exists
Kill NAT for idle devMediumHigh
Convert scheduled jobs to RunTaskLowHigh

Phase 3: crazy advanced mode

ActionRiskExpected impact
Lambda wake page for scale-to-zero HTTP dev envMedium/highVery high
IPv6-only ECS dev experimentHighMedium
ECS on EC2 Spot for dense always-on workloadsHighVery high at scale
SOCI lazy loading for large imagesMediumStartup/cold-scale savings

My recommended final architecture

                         CloudFront
                             |
                    +--------+--------+
                    |                 |
              Static Angular      ALB / API
              S3 or cached        shared ALB
                                      |
                  +-------------------+------------------+
                  |                   |                  |
              prod API            dev API             pg API
           Fargate ARM64      Fargate ARM64      Fargate ARM64
           min 1 task         scheduled 0/1      scheduled 0/1
           on-demand base     Spot allowed       Spot allowed
                  |
             workers/jobs
          EventBridge/SQS -> ECS RunTask on Fargate Spot

The killer formula:

Keep prod stable.
Make everything else disappear when unused.
Use ARM64 everywhere.
Use Spot only where interruption is acceptable.
Remove NAT/ALB/log/public-IP waste.

For your case, the biggest immediate win is probably not a tiny CPU tweak. It is making dev and pg behave like disposable environments: wake when needed, sleep when idle, and never keep their own expensive networking stack alive for no reason.

References