We Cut Our Kubernetes Bill by 60%: Here's How

A deep dive into right-sizing, spot instances, cluster autoscaling, and the tools that made the biggest difference.

By TechWithCare Engineering

The Problem: Overprovisioned Everything

When we audited a client's Kubernetes infrastructure, the numbers were staggering. They were spending $20K/month on AWS EKS, but actual resource utilisation averaged just 15% for CPU and 22% for memory. Pods were requesting 4x what they needed, and nodes were sized for peak loads that happened once a week.

This is incredibly common. Teams set resource requests and limits during initial deployment, then never revisit them. The default instinct is to overprovision — nobody wants to be the person whose service crashed because it ran out of memory.

The fix isn't just 'use smaller instances.' It's a systematic approach to right-sizing at every layer of the stack.

Step 1: Right-Size Your Pods

We started with Vertical Pod Autoscaler (VPA) in recommendation mode. After two weeks of collecting metrics, we had data-driven resource recommendations for every workload. The results were eye-opening: most services needed 60-75% less CPU than they were requesting.

We applied the recommendations incrementally, starting with non-critical services. For each service, we set requests to the P95 usage (what the service actually needs 95% of the time) and limits to 2x the request (headroom for spikes).

The key insight: separate your request and limit strategy for CPU vs memory. CPU is compressible — if a pod hits its CPU limit, it gets throttled but keeps running. Memory is not — if a pod exceeds its memory limit, it gets OOM-killed. So be generous with memory limits but tight with CPU requests.

Step 2: Spot Instances for Stateless Workloads

After right-sizing, we moved all stateless workloads to spot instances. With proper pod disruption budgets and multi-AZ deployment, spot interruptions become a non-event. We use a diversified instance strategy — spreading across 8-10 instance types — to reduce interruption frequency.

The savings were dramatic: 65-70% reduction in compute costs for stateless services. Combined with right-sizing, the client's monthly bill dropped from $20K to $8K — a 60% reduction with zero impact on reliability.

For stateful workloads (databases, message queues), we kept on-demand instances but right-sized them aggressively. Reserved instances or savings plans for these predictable workloads added another 20% savings.

Step 3: Cluster Autoscaling Done Right

The final piece was configuring Karpenter (the successor to Cluster Autoscaler) to dynamically provision nodes based on actual pod scheduling needs. Karpenter is smarter about bin-packing and can provision exactly the right instance type for pending pods.

We also implemented scheduled scaling for predictable traffic patterns. The client's traffic peaked during business hours, so we pre-scale 15 minutes before the morning rush and scale down aggressively after hours. This eliminates cold-start latency while keeping costs low during off-peak.

The result: infrastructure that breathes with demand, costs that track usage, and a team that no longer dreads the monthly AWS bill.

Back to all posts

MORE BUILDLESS BREAK

Start building with a team that cares. No credit card required.

Get Started