Table of Contents
Kubernetes requests and limits define the amount of CPU and memory that pods and containers are able to consume. Setting appropriate requests and limits on your pods prevents resource contention and helps ensure efficient scheduling to your cluster’s nodes.
Unexpected resource consumption is one of the most serious challenges that Kubernetes administrators encounter. A single misconfiguration or spike in user activity can cause containers to consume excessive CPU or memory. This often has unintended consequences for neighboring pods—if memory capacity is exceeded, then the operating system’s out-of-memory (OOM) killer will intervene by terminating pods.
In this article, you’ll learn how to use Kubernetes requests and limits to control resource consumption and prevent runaway usage in one pod from negatively impacting your other pods.
#Setting Up Kubernetes Requests
Kubernetes requests specify the minimum amount of a resource (CPU or memory) that a pod requires to run. This information is used by the Kubernetes scheduler to determine which node in the cluster the pod should be placed on.
Resource requests are defined in the
spec.containers.resources.requests field of your pod manifests. The following example creates a pod that requests 100 CPU millicores and 150 mebibytes of memory:
apiVersion: v1 kind: Pod metadata: name: requests-demo spec: containers: - name: nginx image: nginx:latest resources: requests: cpu: "100m" memory: "150Mi"
It’s possible to define CPU constraints as fractional values instead of an absolute quantity in millicores. In this case,
1.0is equivalent to 1 CPU core, and
0.5is half a CPU core. These values translate to the millicore quantities of
Save the example manifest as
requests-demo.yaml, then use
kubectl to add the pod to your cluster:
$ kubectl apply -f requests-demo.yaml pod/requests-demo created
kubectl describe command will reveal the requests applied to the pod within the “Containers” section of its output:
$ kubectl describe pod requests-demo Name: requests-demo Namespace: default ... Containers: nginx: ... Ready: True Restart Count: 0 Requests: cpu: 100m memory: 150Mi Environment: <none>
#How Kubernetes Uses Requests
Pods that declare a resource request will only schedule to nodes that can provide the requested quantity of the resource. This is determined by calculating the sum of the existing requests made by the pods already running on each node. Here’s an example:
|Node Memory Capacity||New Pod Memory Request||Sum of Existing Pods' Memory Requests||New Pod Allowed to Schedule?|
|2000 Mi||500 Mi||0 Mi||Yes|
|2000 Mi||500 Mi||1000 Mi||Yes|
|2000 Mi||500 Mi||1600 Mi||No|
The last combination fails because the new pod’s request of 500 MiB cannot be satisfied; the node’s capacity is 2000 MiB, and the pods already running on the node are requesting 1600 MiB. The new pod is prevented from scheduling because it would cause the total memory request to exceed the node’s capacity. CPU requests are considered in the same way.
Requests don’t limit the actual resource usage of pods once they’re scheduled to nodes. Pods are free to use more CPU and memory than their requests allow when there is spare capacity on the node.
CPU requests act as weights that preserve quality of service when resource contention occurs. If the CPU is at 100 percent utilization, then pods will be able to use the proportion of the available CPU time that matches the weight of their request:
|Available CPU||Pod 1 CPU Request||Pod 2 CPU Request||Pod 1 Effective CPU||Pod 2 Effective CPU|
The table above describes a situation where there is spare CPU available on the node and both pods try to utilize it. Because CPU requests are used, the relative weighting of Pod 1 and Pod 2 is still preserved—both pods can consume more CPU than their request defines, but Pod 1 continues to receive a greater share of the available capacity.
Exceeding a memory request has slightly different effects due to the finite nature of memory. Containers are able to use more memory when it’s available, but doing so makes them candidates for eviction if the node begins to experience a memory pressure situation. When Kubernetes needs to free up memory, pods using more than their request will be the first to be targeted.
#Setting Up Kubernetes Limits
A limit is a second kind of constraint that prevents pods from using more than a specified amount of a resource. Whereas requests manage workload-level contention and quality of service, limits are a device for maintaining cluster stability.
Limits are set in a similar way to requests by using the
spec.containers.resources.limits field in your pod manifests:
apiVersion: v1 kind: Pod metadata: name: limits-demo spec: containers: - name: nginx image: nginx:latest resources: limits: cpu: 100m memory: 150Mi
You can save the pod manifest to
limits-demo.yaml, apply it to your cluster, and check that the limits have been set:
$ kubectl apply -f limits-demo.yaml pod/limits-demo created $ kubectl describe pod limits-demo Name: limits-demo Namespace: default ... Containers: nginx: ... Ready: True Restart Count: 0 Limits: cpu: 100m memory: 150Mi Requests: cpu: 100m memory: 150Mi Environment: <none>
Limits are designed to be used in conjunction with requests. If you set a resource limit but omit its request—as in this example—then Kubernetes will automatically apply a request that’s equal to your declared limit.
#How Kubernetes Uses Limits
Kubernetes uses resource limits to identify pods that are depriving their neighbors of resources and could destabilize the cluster. Limits indicate that the pod must not be allowed to use more than a defined amount of a resource, but enforcement is applied differently for CPU and memory constraints:
- Exceeding a CPU limit will not result in a pod being terminated, but it may result in it being throttled to preserve the performance of neighboring pods. Throttling will not occur if there is spare CPU capacity on the node.
- Exceeding a memory limit results in the system’s out-of-memory (OOM) killer intervening. This will kill the process that tried to allocate the additional memory. As there’ll usually only be one process in a container, this normally results in the container terminating and being restarted.
Using limits allows you to set a hard cap on pod resource utilization. Correctly configured limits will prevent individual pods from destabilizing the cluster and their neighbors.
#Best Practices for Setting Kubernetes Requests and Limits
Requests and limits require some finessing. Poorly chosen values can reduce your cluster’s performance or cause waste. For example, if a request is set higher than your application actually requires, this would cause some cluster capacity to be permanently allocated to your pod even though it will never be used.
Below are a few best practices to follow as you set your resource constraints.
#Don’t Use CPU Limits
CPU limits aren’t required to guarantee a fair allocation of CPU time. Imposing a limit on a pod will throttle its CPU usage even when there are actually spare CPU cycles available. This can reduce performance.
Instead, it’s best to set a CPU request on each of your pods while omitting a limit. When resource contention occurs, the pods will be granted the proportion of available CPU time that matches the relative weight of their request. During periods of no contention, each pod will be able to access the CPU’s full performance.
The following configuration sets a request for 100 millicores, equivalent to 0.1 of a CPU core. In practice, the pod will be able to use more CPU capacity when it’s available, but will always have a half share when compared to another running pod with a request of
resources: requests: cpu: 100m
#Always Set Equal Memory Requests and Limits
Unlike CPU time, memory is a finite resource. While you can set different requests and limits for a pod, this is generally undesirable and can result in confusing behavior.
You will usually want to allocate a specific amount of memory to a pod, then have it terminate when it tries to consume more. This matches the behavior of most other memory allocation systems outside Kubernetes.
By setting equal memory requests and limits, you can ensure each pod has access to the memory it needs but will be terminated as soon as it uses more. Setting limits higher than requests exposes you to unexpected pod evictions if the cluster has to reclaim memory while the pod is exceeding its request.
The following snippet guarantees that the pod can allocate 500 MiB of memory but will be prevented from consuming more:
resources: requests: memory: 500Mi limits: memory: 500Mi
#Regularly Review and Rightsize Your Requests and Limits
It’s important to ensure your requests and limits are appropriate for your workload. To achieve this, you should first set the values to the highest amount you anticipate needing, then regularly tune them based on actual utilization.
Rightsizing resource allocations to match your workload’s requirements will ensure optimal performance and efficiency. Setting constraints too high means spare cluster capacity is left unused, whereas overly restrictive values reduce performance and compromise workload stability when pods prematurely terminate.
You can monitor the actual resource usage of your pods by installing the Kubernetes Metrics Server in your cluster. This lets you use the
kubectl top command to inspect your pods:
# View pod resource consumption $ kubectl top pod NAME CPU(cores) MEMORY(bytes) cilium-nqxns 54m 227Mi cilium-operator-64d49bc6f9-shkvm 2m 30Mi coredns-d767c746-2mktq 3m 35Mi
#Common Issues and Solutions When Managing Kubernetes Requests and Limits
Kubernetes requests and limits are generally straightforward to administer once you understand how they interact. Nonetheless, there are a few possible gotchas that can cause problems as you set your values:
- Incorrect requests/limits due to wrong units: Confusion around quantity units can cause pods to be incorrectly configured. CPU is expressed as millicores (
100m) or a relative amount of a whole core (
0.1), whereas memory is measured in bytes with an optional quantity suffix such as
1Gi(gibibytes). Getting these units wrong will produce unexpected outcomes.
- Pods terminating before their memory limit is reached: Pods can sometimes terminate before their memory limit is reached. This happens when you set the pod’s memory limit higher than its memory request. The pod may be evicted from its node if resource contention occurs while its memory use is higher than the requested amount.
- Poor performance but low CPU load: Poorly performing workloads that show signs of underutilization of available CPU cycles are generally caused by incorrect use of CPU limits. You should avoid setting CPU limits so that pods are able to freely use spare capacity when it’s available.
- Setting overall resource limits for a team or application: Plain requests and limits work at the pod level, but you’ll often need to set higher-level constraints as well. Resource quotas can be used to set overall resource usage constraints at the namespace level, allowing you to distribute resources between different team members, applications, and environments.
These tips will help you use requests and limits to reliably control cluster utilization.
In this article, you learned how to set Kubernetes requests and limits to precisely define CPU and memory usage constraints for your pods. Requests are used by the scheduling process to evaluate which nodes can accept a new pod, while limits are enforced during a pod’s runtime to prevent excessive resource consumption.
Properly setting Kubernetes requests and limits will keep your workloads performant and maintain your cluster’s stability. They also help to ensure efficient cluster utilization, which reduces waste and prevents excessive costs. For these reasons, requests and limits should always be set on your pods, and most Kubernetes configuration scan tools require them to be present.