How To Reduce Your Kubernetes Cost

thumbnail for this post

Running Kubernetes can be very expensive, especially when it is done inefficiently. This is often the case when companies have just started to roll out Kubernetes in their organizations as then the same configuration and setup are often used that worked well for initial test projects or small applications. Such an initial unoptimized setup is probably normal. However, companies should start to also consider cost soon because this can save a lot of money and prevents the spread of bad practices.

In this post, I will describe some ways to control and reduce your Kubernetes cost that can be applied for very different Kubernetes use cases, from development and CI/CD to production.

Cost Monitoring

The first step to efficiently reduce your Kubernetes cost is to monitor these costs. This will not only help you to determine how much you saved eventually but also gives you an overview of what the most important cost drivers are.

Basic monitoring for your computing cost is provided by the public cloud providers as they are the ones ultimately charging you. Besides the total cost, you can also see in their billing overviews what exactly is running and what drives your cost, e.g. what proportion of your cost is attributed to computing, storage, and network traffic.

However, the overview of the cloud providers can only give you a basic understanding that is only limitedly helpful for multi-tenant Kubernetes clusters and of course is not available in private clouds. Therefore, it often makes sense to use additional tools to measure your Kubernetes usage and costs. Some useful tools in this area are Prometheus, Kubecost, and Replex.

Once you have set up your Kubernetes cost monitoring and identified areas of improvement, you can start with the actual cost optimizations.

Resource Limits

A good first cost-saving step is to set limits for resource usage. Efficient resource limits ensure that no application or user of your Kubernetes system consumes excessive computing resources. Therefore, they prevent you from unpleasant surprises in form of sudden cost increases.

Especially if you give developers direct Kubernetes access, e.g. in form of a self-service Kubernetes platform, limits are critical as they enforce a fair sharing of available resources, which keeps the total cluster size smaller. Without limits, one user could consume all resources leaving others unable to work, which will make them require more computing resources in total.

This goes not only for engineers in a Kubernetes multi-tenancy environment but also for applications running on the same cluster. In any case, it is of course important to set the right limits: If they are too low, engineers and applications cannot work properly and if they are too high, they are mostly useless. Here, the previously mentioned Kubernetes monitoring tools can help you to determine the right limits for the different use cases.

To implement resource limits, you can configure them Kubernetes-natively with Resource Quotas and Limit Ranges.


Now that you have cost monitoring in place and prevented unexpected cost explosions, you can focus on saving costs. The best way to do this is to only pay for what you actually need. For this, it is important to get the size of your clusters, virtual clusters (vClusters), namespaces, etc. right.

Again, monitoring tools are helpful for this but manually observing and adjusting computing resources is not a very fast and flexible solution. To be able to respond to short-term fluctuations, you should therefore enable Kubernetes autoscaling.

Here, there are two types of autoscaling: Horizontal and vertical autoscaling. In short, horizontal autoscaling means adding and removing pods if the load is above or below a predefined threshold. With vertical autoscaling, the size of the individual pods is adjusted.

Both types of autoscaling are helpful to adjust your available computing resources automatically to your actual needs. However, this process is not always optimal because it may be too “conservative” or does not work for all use cases, e.g. there might be something running (requiring computing resources and thus not automatically downscaled) that is not used anymore. Therefore, some other measures should be implemented that will further drive your Kubernetes cost down.

Discounted Computing Resources

Another way to save computing cost is to use resources that have a discounted price. Of course, this only works in public clouds that offer such resources. However, all major cloud providers have some options for this:

AWS Spot Instances, GCP Preemptible VMs, and Azure Spot VMs provide heavily discounted computing resources that are “excess capacity” of the cloud providers. The providers sell this computing capacity cheaper as it would otherwise be unused. However, this also means that prices for these computing resources fluctuate depending on the demand and that your instance and application may be stopped if the price exceeds your limit or if there is no capacity available anymore. For this, spot instances are mostly an option for applications and workloads that are not needed permanently, e.g. if you want to run a computing-intense experiment. However, if you can use this type of discounted computing, you can save a lot of costs - AWS claims that you may get “up to a 90% discount”.

If spot instances and preemptible VMs are not an option for your application, for example, because you always need to run it without any disruption, you may also get a discount for committing to use the resources permanently for a specific time period. By committing to a one- or three-year usage period, you can get a significant discount - 40%-60% according to AWS. Such a long-term discount is available for AWS and Azure as “Reserved Instances”, and for Google as “Committed Use Discount”. This type of discount can make sense if you have a predictable and continuous need for computing resources over a longer period of time.

Overall, it depends on your type of application and computing needs, which type of discount is applicable. Here, it is also possible to use both types for different parts of your application or for different use cases. Therefore, even if the pricing of all public cloud providers is quite complicated, it can certainly pay off to take a closer look at the available discounts there so you get the same computing resources just at a lower price.

Sleep Mode

There are quite a few scenarios in which clusters, vClusters, and namespaces continue to run and create costs, although they are not needed anymore. These use cases often (but not only) happen when Kubernetes is used during development, testing, or CI/CD by engineers.

For example, if a developer is using a Kubernetes development environment in the cloud, they only need this environment during their work hours. If you assume that they work 40 hours per week with a Kubernetes environment (not accounting for meetings or holidays) and the environment is running all the time, you can save more than 75% (a week has 168 hours) by shutting off the environment when it is not needed.

The developers could of course shut down their environment themselves when they are done, but this is a process that is easily forgotten or ignored. It thus makes sense to automize it with a “sleep mode”, i.e. an automized process that scales down unused namespaces and virtual clusters. This ensures that the state of the environment is stored and that the environment can “wake up” again very fast and automatically when it is needed again so that the engineer’s workflow is not interrupted.

It is possible to implement such a sleep mode with scripts or by using tools that have a sleep mode inbuilt, such as Loft.

For a more detailed description and calculation of the cost-saving potential of a sleep mode, you should look at my article about saving Kubernetes cost for engineers.


Related to a sleep mode to scale down computing resources that are temporarily not needed, you should also clean up your Kubernetes system from time to time. Especially, if you allow engineers to create namespaces on demand or if you use Kubernetes for CI/CD, you may have a lot of unused namespaces or even clusters that still cost you money.

Even if you have a sleep mode in place that scales down the computing resources, the sleep mode is only intended for temporarily unused computing resources and thus preserves the state, including storage and configuration. However, especially to run CI/CD pipelines or tests, it is usually not necessary to preserve the state of the Kubernetes environment, so it is better to also delete it.

Again, this can be done with individual scripts or by using Loft, which provides such an auto-delete feature out-of-the-box. As an additional positive side-effect, deleting some environments makes it easier for admins to oversee the system, which can also be seen as an indirect cost-saving factor.

Cluster Sharing

A final Kubernetes cost-saving approach is to reduce the number of clusters. This generally saves cost because the control plane and the computing resources can be shared by several users or applications in a Kubernetes multi-tenancy setup. If you use a public cloud that charges a cluster management fee, such as AWS or Google Cloud, you can also save this fee. Finally, managing many clusters is a complex task, so limiting the number of clusters also relieves the admins.

Especially during the pre-production stage, many companies have too many clusters, sometimes even for every individual developer. This is clearly inefficient as a whole cluster is rarely needed during development and even if developers want to experiment with cluster-wide configurations, they could use a virtual cluster as dev environment instead. Therefore, relying on a multi-tenant internal Kubernetes platform for all non-production use cases can save a lot of cost by eliminating unnecessary redundancies.

However, also during production, reducing the number of clusters can make sense with the same underlying idea of improving resource utilization, reducing redundancies, and avoiding cluster management fees. Of course, this does not mean that you should only have one cluster. You should rather generally evaluate what user groups and applications can share a cluster and when dedicated clusters make sense.

For a more detailed discussion on this topic, also read my post about Kubernetes cost savings by reducing the number of clusters.


I believe that the cost of running Kubernetes at a larger scale and with many users will become an issue for more and more companies soon as some initial inefficiencies can be removed relatively easily but can still lead to huge cost savings.

A first step to manage your Kubernetes cost is to get an overview and to start monitoring the cost. Then, you should implement limits to prevent excessive computing resource consumption, which makes your cost more predictable. To reduce your cost, finding the right size of your clusters, virtual clusters, and namespaces is important and autoscaling can be very helpful for this. You should also take a look at the discount options of public cloud providers if they are an option for you. An automatic sleep mode and cleanup for unused Kubernetes namespaces and vClusters are further measures to eliminate idle resources. Finally, you should consider reducing the number of clusters because this improves resource utilization and reduces unnecessary redundancies, especially at the pre-production stage.

If you implement all these measures, your Kubernetes system is quite cost-optimized and thus much cheaper than any initial best guess setup. Then, you should be ready to tackle your next challenges in using Kubernetes without having to fear its cost.

Photo by Alexander Mils on Unsplash