Kubernetes Cost Optimization: Our Playbook
Practical strategies we used to cut our Kubernetes spending by 60%.
Kubernetes is powerful, but it's also expensive if left unmanaged. After auditing our own infrastructure and those of 15 clients, we found that the average Kubernetes cluster wastes 40-65% of its compute budget on over-provisioned resources, idle nodes, and inefficient scheduling. This article shares the exact playbook we use to cut Kubernetes costs without sacrificing reliability.
1. Right-Size Your Workloads
The single biggest source of waste is over-provisioned resource requests. Developers set CPU and memory requests based on peak usage (or just guess), and Kubernetes dutifully reserves those resources even when they're 90% idle. We use the Vertical Pod Autoscaler (VPA) in recommendation mode to analyze actual resource usage over 14 days, then adjust requests to the P95 value with a 20% buffer.
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-server-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: "Off" # Recommendation only — we review before applying
resourcePolicy:
containerPolicies:
- containerName: api-server
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2000m
memory: 4GiIn one client's cluster, we found that their API pods were requesting 2 CPU cores and 4GB memory but actually using 0.3 cores and 800MB on average. Right-sizing just this one deployment freed up 40% of their cluster capacity.
2. Implement Cluster Autoscaling
Many teams run fixed-size clusters sized for peak traffic. This means they're paying for peak capacity 24/7, even though most applications see 3-5x traffic variation between peak and off-peak hours. The Kubernetes Cluster Autoscaler, combined with Horizontal Pod Autoscaler (HPA), automatically scales both pods and nodes based on actual demand.
Configure the Cluster Autoscaler with --scale-down-delay-after-add=10m and --scale-down-unneeded-time=10m to prevent thrashing. Aggressive scale-down can cause cascading failures during traffic spikes.
3. Use Spot/Preemptible Instances
Spot instances (AWS) / Preemptible VMs (GCP) cost 60-90% less than on-demand pricing. The tradeoff is that they can be reclaimed with 2 minutes notice. For stateless workloads (web servers, batch processing, CI/CD runners), this is an excellent tradeoff.
Our approach: run the baseline load on on-demand instances (or reserved instances for predictable workloads) and burst capacity on spot instances. We use Karpenter on AWS for intelligent spot instance selection across multiple instance types and availability zones.
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: spot-burst
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: node.kubernetes.io/instance-type
operator: In
values:
- m5.xlarge
- m5a.xlarge
- m6i.xlarge
- c5.xlarge # Diversify for spot availability
- c5a.xlarge
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
limits:
cpu: "100"
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 60s4. Namespace-Level Cost Allocation
You can't optimize what you can't measure. We deploy Kubecost or OpenCost in every cluster to provide namespace-level cost breakdowns. This shows exactly which team, service, or environment is consuming resources. The visibility alone drives optimization — when teams see their monthly spend, they start right-sizing voluntarily.
5. Optimize Storage Costs
Persistent volumes are often overlooked in cost optimization. Teams provision 100GB GP3 volumes for databases that use 5GB, and those volumes persist even after the pods are deleted. We implement automated PV monitoring that alerts when volume utilization drops below 20%, and we regularly clean up orphaned volumes from deleted namespaces.
- Audit all PersistentVolumeClaims monthly — delete orphaned volumes from deleted namespaces
- Use appropriate storage classes: gp3 for general workloads, io2 only for latency-sensitive databases
- Enable EBS snapshot lifecycle policies to avoid accumulating old snapshots
- Consider EFS for shared storage instead of provisioning separate EBS volumes per pod
Results: A Real Cost Breakdown
Here's the actual cost reduction we achieved for a mid-size SaaS client running 12 microservices on EKS:
- Right-sizing workloads: -28% ($4,200/month saved)
- Cluster autoscaling (night/weekend scale-down): -18% ($2,700/month saved)
- Spot instances for stateless workloads: -22% ($3,300/month saved)
- Storage optimization: -8% ($1,200/month saved)
- Reserved instances for baseline: -12% ($1,800/month saved)
- Total reduction: $15,000 → $5,800/month (61% savings)
“We were spending more on Kubernetes than our entire engineering team's coffee budget. After Vaarak's optimization, we're spending less than our Slack subscription. And our applications are actually more reliable because the autoscaling handles traffic spikes better than our old fixed-size cluster.”
— Jamie Torres, CTO at CloudSync
Cost optimization isn't a one-time project — it's an ongoing practice. We recommend scheduling quarterly cost reviews, setting up cost anomaly alerts, and making cost visibility a standard part of your engineering dashboards. The cloud providers are always adding new instance types, pricing models, and services that can reduce costs further.
Sarah Chen
Cloud Infrastructure Architect