TL;DR
- Most Kubernetes clusters waste 30-50% of cloud spend on over-provisioned requests, idle nodes, and forgotten workloads.
- The biggest wins come from three levers: rightsizing (match requests to real usage), autoscaling (scale to demand), and spot instances (pay 60-90% less for interruptible compute).
- Cost work is not a one-time cleanup — without governance and visibility, savings erode within a quarter as new workloads ship.
- Tools automate the toil: Kubecost for visibility, Cast AI/ScaleOps for autonomous rightsizing, and platforms like SRExpert that surface rightsizing recommendations alongside security, compliance and monitoring.
Why Kubernetes Bills Spiral
Kubernetes makes it trivially easy to ship workloads — and just as easy to waste money. The same flexibility that lets you scale in seconds also lets every team over-request CPU "to be safe," leave dev namespaces running over the weekend, and forget the autoscaler maximums set during a launch six months ago.
Industry surveys consistently put Kubernetes resource waste at 30-50%. The FinOps Foundation's 2025 State of FinOps report lists "reducing waste / managing overspend" as the #1 priority for the second year running, and Kubernetes is repeatedly named the hardest environment to attribute and control.
The waste hides in five places:
- Over-provisioned requests — engineers request 2 CPU / 4Gi "to be safe"; the pod uses 0.2 CPU / 500Mi.
- Idle nodes — the cluster autoscaler keeps warm capacity that nothing schedules onto.
- On-demand pricing everywhere — stateless, retry-safe workloads running on full-price nodes instead of spot.
- Forgotten workloads — dev/staging namespaces, old CronJobs, and PoCs nobody turned off.
- No attribution — without per-namespace/per-team cost, nobody owns the bill, so nobody cuts it.
Here are the eight strategies that address them, ordered by typical ROI.
1. Rightsize Workload Requests and Limits
The single biggest lever. CPU and memory requests determine how much capacity the scheduler reserves — and you pay for reserved capacity whether or not it is used.
Pull the gap between requested and actually-used resources (the Vertical Pod Autoscaler in recommendation mode, Kubecost, or your platform's analytics will show it), then bring requests down toward the real P95 usage.
# Before: requested "to be safe"
resources:
requests:
cpu: "2000m"
memory: "4Gi"
# After: based on P95 actual usage
resources:
requests:
cpu: "250m"
memory: "768Mi"
A realistic rightsizing pass across a mid-size cluster recovers 25-40% of node cost on its own. The catch: do it per-workload from real data, and re-check quarterly — usage drifts.
2. Autoscale at Every Layer
Three autoscalers, three jobs:
- Horizontal Pod Autoscaler (HPA) — add/remove pod replicas based on CPU, memory, or custom metrics.
- Vertical Pod Autoscaler (VPA) — adjust pod requests automatically (use carefully with HPA).
- Cluster Autoscaler / Karpenter — add/remove nodes to match pod demand.
Karpenter (AWS) and the standard Cluster Autoscaler ensure you are not paying for warm nodes at 3 AM when traffic is flat. Pair node autoscaling with rightsized requests — otherwise the autoscaler just provisions nodes for bloated reservations.
3. Use Spot / Preemptible Instances
Spot instances (AWS), Spot VMs (Azure) and Preemptible VMs (GCP) cost 60-90% less than on-demand. The trade-off is they can be reclaimed with ~30-120 seconds notice, so they suit stateless, retry-tolerant, or batch workloads.
The pattern: run a small on-demand baseline for stateful and latency-critical pods, and a spot pool for everything else, with nodeSelector/affinity and PodDisruptionBudgets to handle reclaims gracefully. This is where autonomous tools earn their keep — see the tooling section.
4. Bin-Pack and Consolidate Nodes
A cluster that runs 20 nodes at 35% utilization should be running ~8 nodes at 80%. Bin-packing schedules pods densely so you run fewer, better-utilized nodes. Karpenter's consolidation and tools like ScaleOps do this continuously, draining and removing under-used nodes automatically.
5. Set Resource Quotas and LimitRanges
Governance stops waste before it ships. ResourceQuota caps total CPU/memory per namespace; LimitRange sets sane defaults so a pod with no requests does not grab a whole node.
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-quota
namespace: team-a
spec:
hard:
requests.cpu: "20"
requests.memory: 40Gi
limits.cpu: "40"
This is also a security and stability control — the same kind of policy enforcement SRExpert applies for compliance and guardrails.
6. Kill Idle and Zombie Workloads
Scale dev/staging to zero outside working hours (a simple CronJob or KEDA can do it). Hunt down: completed Jobs that were never cleaned up, orphaned PersistentVolumes still being billed, LoadBalancers pointing at deleted services, and namespaces from PoCs that ended months ago. This is unglamorous and recovers real money every single month.
7. Attribute Cost by Team and Namespace
You cannot cut what you cannot see. Cost allocation by namespace, label, and team turns "the Kubernetes bill is too high" into "team-a's batch job costs €4,000/month." Kubecost and OpenCost are the standard for showback/chargeback. Attribution is what makes every other strategy stick, because it creates an owner.
8. Make Optimization Continuous (Governance)
The hard truth: a one-time cleanup decays. New services ship over-provisioned, launch-time autoscaler maximums never get lowered, and within a quarter you are back where you started. Bake cost into the workflow — rightsizing recommendations in the dashboard your team already uses, alerts when a namespace's spend jumps, and a quarterly review. Continuous beats heroic.
The Tooling Landscape
No single tool wins every category. Here is how the main options map to the strategies above:
| Tool | Focus | Rightsizing | Spot automation | Cost visibility | Beyond cost |
|---|---|---|---|---|---|
| Kubecost / OpenCost | Cost visibility & allocation | Recommendations | No | Excellent | No |
| Cast AI | Autonomous FinOps | Automated | Automated | Good | No |
| ScaleOps | Autonomous rightsizing | Automated | Pool management | Good | No |
| Karpenter | Node provisioning | Indirect | Yes | No | No |
| SRExpert | K8s operations platform | Recommendations + one-click apply | Spot candidate detection | Built-in | Security, compliance, alerting, AI, Helm |
If cost is your only problem, a dedicated FinOps tool like Cast AI or Kubecost goes deepest. If cost is one of several operational concerns — alongside security, compliance, monitoring and alerting — a unified platform avoids running (and paying for) yet another point tool. SRExpert surfaces rightsizing recommendations and spot candidates as one capability inside the same dashboard you use to monitor, secure and operate your clusters. See the full observability guide for how cost and performance data fit together.
A Realistic 30-Day Plan
You do not need a FinOps team to start. A pragmatic sequence:
- Week 1 — See it. Install Kubecost/OpenCost or enable your platform's cost view. Get per-namespace spend. Find the top 5 most expensive workloads.
- Week 2 — Rightsize the top 5. Pull P95 usage, lower requests, redeploy, confirm no regression.
- Week 3 — Autoscale + spot. Turn on HPA where missing, move one stateless workload to a spot pool, validate it survives a reclaim.
- Week 4 — Govern. Add ResourceQuotas per namespace, schedule dev scale-to-zero, set a spend alert, and book a recurring quarterly review.
A team that does this typically lands 25-40% lower cloud spend within the first month — and, with governance in place, keeps it.
FAQ
How much can Kubernetes cost optimization actually save? Most teams recover 30-50% of waste. Rightsizing alone commonly returns 25-40% of node cost; adding spot instances and bin-packing pushes total savings toward 40-60% for spot-tolerant workloads.
What is the difference between requests and limits for cost? Requests reserve capacity the scheduler bills you for whether used or not, so they are the primary cost lever. Limits cap maximum usage (protecting noisy-neighbor and OOM scenarios) but do not directly drive reserved cost.
Are spot instances safe for production? Yes, for stateless and retry-tolerant workloads, when paired with PodDisruptionBudgets, a small on-demand baseline for critical pods, and graceful handling of reclaim notices. Avoid spot for stateful databases and latency-critical singletons.
Do I need a dedicated FinOps tool like Cast AI or Kubecost? If cost is your only concern, a dedicated tool goes deepest. If cost is one of several operational needs, a unified platform like SRExpert covers the 80% of cost pain (rightsizing recommendations, spot candidates, attribution) without adding a separate tool to your stack.
Start Optimizing
SRExpert's free tier includes cluster monitoring and resource analytics for one cluster — enough to spot your biggest waste in an afternoon. Install in 5 minutes via Helm, connect a cluster, and review the rightsizing recommendations.
Start free at srexpert.cloud/try-now · See all features · Compare plans

