Kubernetes Cost Optimization: Our Playbook

Kubernetes is powerful, but it's also expensive if left unmanaged. After auditing our own infrastructure and those of 15 clients, we found that the average Kubernetes cluster wastes 40-65% of its compute budget on over-provisioned resources, idle nodes, and inefficient scheduling. This article shares the exact playbook we use to cut Kubernetes costs without sacrificing reliability.

Cloud server infrastructure — The average Kubernetes cluster wastes 40-65% of its compute budget on over-provisioned resources

1. Right-Size Your Workloads

The single biggest source of waste is over-provisioned resource requests. Developers set CPU and memory requests based on peak usage (or just guess), and Kubernetes dutifully reserves those resources even when they're 90% idle. We use the Vertical Pod Autoscaler (VPA) in recommendation mode to analyze actual resource usage over 14 days, then adjust requests to the P95 value with a 20% buffer.

vpa-recommendation.yaml

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Off"  # Recommendation only — we review before applying
  resourcePolicy:
    containerPolicies:
    - containerName: api-server
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2000m
        memory: 4Gi

In one client's cluster, we found that their API pods were requesting 2 CPU cores and 4GB memory but actually using 0.3 cores and 800MB on average. Right-sizing just this one deployment freed up 40% of their cluster capacity.

2. Implement Cluster Autoscaling

Many teams run fixed-size clusters sized for peak traffic. This means they're paying for peak capacity 24/7, even though most applications see 3-5x traffic variation between peak and off-peak hours. The Kubernetes Cluster Autoscaler, combined with Horizontal Pod Autoscaler (HPA), automatically scales both pods and nodes based on actual demand.

Configure the Cluster Autoscaler with --scale-down-delay-after-add=10m and --scale-down-unneeded-time=10m to prevent thrashing. Aggressive scale-down can cause cascading failures during traffic spikes.

3. Use Spot/Preemptible Instances

Spot instances (AWS) / Preemptible VMs (GCP) cost 60-90% less than on-demand pricing. The tradeoff is that they can be reclaimed with 2 minutes notice. For stateless workloads (web servers, batch processing, CI/CD runners), this is an excellent tradeoff.

Our approach: run the baseline load on on-demand instances (or reserved instances for predictable workloads) and burst capacity on spot instances. We use Karpenter on AWS for intelligent spot instance selection across multiple instance types and availability zones.

karpenter-provisioner.yaml

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot-burst
spec:
  template:
    spec:
      requirements:
      - key: karpenter.sh/capacity-type
        operator: In
        values: ["spot"]
      - key: node.kubernetes.io/instance-type
        operator: In
        values:
        - m5.xlarge
        - m5a.xlarge
        - m6i.xlarge
        - c5.xlarge    # Diversify for spot availability
        - c5a.xlarge
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
  limits:
    cpu: "100"
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 60s

Global cloud infrastructure visualization — Multi-zone spot instance diversification ensures availability even during spot reclamation events

4. Namespace-Level Cost Allocation

You can't optimize what you can't measure. We deploy Kubecost or OpenCost in every cluster to provide namespace-level cost breakdowns. This shows exactly which team, service, or environment is consuming resources. The visibility alone drives optimization — when teams see their monthly spend, they start right-sizing voluntarily.

5. Optimize Storage Costs

Persistent volumes are often overlooked in cost optimization. Teams provision 100GB GP3 volumes for databases that use 5GB, and those volumes persist even after the pods are deleted. We implement automated PV monitoring that alerts when volume utilization drops below 20%, and we regularly clean up orphaned volumes from deleted namespaces.

Audit all PersistentVolumeClaims monthly — delete orphaned volumes from deleted namespaces
Use appropriate storage classes: gp3 for general workloads, io2 only for latency-sensitive databases
Enable EBS snapshot lifecycle policies to avoid accumulating old snapshots
Consider EFS for shared storage instead of provisioning separate EBS volumes per pod

Results: A Real Cost Breakdown

Here's the actual cost reduction we achieved for a mid-size SaaS client running 12 microservices on EKS:

Right-sizing workloads: -28% ($4,200/month saved)
Cluster autoscaling (night/weekend scale-down): -18% ($2,700/month saved)
Spot instances for stateless workloads: -22% ($3,300/month saved)
Storage optimization: -8% ($1,200/month saved)
Reserved instances for baseline: -12% ($1,800/month saved)
Total reduction: $15,000 → $5,800/month (61% savings)

“We were spending more on Kubernetes than our entire engineering team's coffee budget. After Vaarak's optimization, we're spending less than our Slack subscription. And our applications are actually more reliable because the autoscaling handles traffic spikes better than our old fixed-size cluster.”
— Jamie Torres, CTO at CloudSync

Cost optimization isn't a one-time project — it's an ongoing practice. We recommend scheduling quarterly cost reviews, setting up cost anomaly alerts, and making cost visibility a standard part of your engineering dashboards. The cloud providers are always adding new instance types, pricing models, and services that can reduce costs further.