How to Set Up Kubernetes Requests and Limits

James Walker
Minute Read

Kubernetes requests and limits define the amount of CPU and memory that pods and containers are able to consume. Setting appropriate requests and limits on your pods prevents resource contention and helps ensure efficient scheduling to your cluster's nodes.

Unexpected resource consumption is one of the most serious challenges that Kubernetes administrators encounter. A single misconfiguration or spike in user activity can cause containers to consume excessive CPU or memory. This often has unintended consequences for neighboring pods—if memory capacity is exceeded, then the operating system's out-of-memory (OOM) killer will intervene by terminating pods.

In this article, you'll learn how to use Kubernetes requests and limits to control resource consumption and prevent runaway usage in one pod from negatively impacting your other pods.

Setting Up Kubernetes Requests

Kubernetes requests specify the minimum amount of a resource (CPU or memory) that a pod requires to run. This information is used by the Kubernetes scheduler to determine which node in the cluster the pod should be placed on.

Resource requests are defined in the spec.containers.resources.requests field of your pod manifests. The following example creates a pod that requests 100 CPU millicores and 150 mebibytes of memory:

apiVersion: v1
kind: Pod
metadata:
  name: requests-demo
spec:
  containers:
    - name: nginx
      image: nginx:latest
      resources:
        requests:
          cpu: "100m"
          memory: "150Mi"

It's possible to define CPU constraints as fractional values instead of an absolute quantity in millicores. In this case, 1.0 is equivalent to 1 CPU core, and 0.5 is half a CPU core. These values translate to the millicore quantities of 1000m and 500m, respectively.

Save the example manifest as requests-demo.yaml, then use kubectl to add the pod to your cluster:

$ kubectl apply -f requests-demo.yaml
pod/requests-demo created

The kubectl describe command will reveal the requests applied to the pod within the "Containers" section of its output:

$ kubectl describe pod requests-demo
Name:             requests-demo
Namespace:        default
...
Containers:
  nginx:
    ...
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        100m
      memory:     150Mi
    Environment:  <none>

How Kubernetes Uses Requests

Pods that declare a resource request will only schedule to nodes that can provide the requested quantity of the resource. This is determined by calculating the sum of the existing requests made by the pods already running on each node. Here's an example:

Node Memory Capacity New Pod Memory Request Sum of Existing Pods' Memory Requests New Pod Allowed to Schedule?
2000 Mi 500 Mi 0 Mi Yes
2000 Mi 500 Mi 1000 Mi Yes
2000 Mi 500 Mi 1600 Mi No

The last combination fails because the new pod's request of 500 MiB cannot be satisfied; the node's capacity is 2000 MiB, and the pods already running on the node are requesting 1600 MiB. The new pod is prevented from scheduling because it would cause the total memory request to exceed the node's capacity. CPU requests are considered in the same way.

Requests don't limit the actual resource usage of pods once they're scheduled to nodes. Pods are free to use more CPU and memory than their requests allow when there is spare capacity on the node.

CPU requests act as weights that preserve quality of service when resource contention occurs. If the CPU is at 100 percent utilization, then pods will be able to use the proportion of the available CPU time that matches the weight of their request:

Available CPU Pod 1 CPU Request Pod 2 CPU Request Pod 1 Effective CPU Pod 2 Effective CPU
4000m 1500m 500m 3000m 1000m

The table above describes a situation where there is spare CPU available on the node and both pods try to utilize it. Because CPU requests are used, the relative weighting of Pod 1 and Pod 2 is still preserved—both pods can consume more CPU than their request defines, but Pod 1 continues to receive a greater share of the available capacity.

Exceeding a memory request has slightly different effects due to the finite nature of memory. Containers are able to use more memory when it's available, but doing so makes them candidates for eviction if the node begins to experience a memory pressure situation. When Kubernetes needs to free up memory, pods using more than their request will be the first to be targeted.

Setting Up Kubernetes Limits

A limit is a second kind of constraint that prevents pods from using more than a specified amount of a resource. Whereas requests manage workload-level contention and quality of service, limits are a device for maintaining cluster stability.

Limits are set in a similar way to requests by using the spec.containers.resources.limits field in your pod manifests:

apiVersion: v1
kind: Pod
metadata:
  name: limits-demo
spec:
  containers:
    - name: nginx
      image: nginx:latest
      resources:
        limits:
          cpu: 100m
          memory: 150Mi

You can save the pod manifest to limits-demo.yaml, apply it to your cluster, and check that the limits have been set:

$ kubectl apply -f limits-demo.yaml
pod/limits-demo created

$ kubectl describe pod limits-demo
Name:             limits-demo
Namespace:        default
...
Containers:
  nginx:
    ...
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:        100m
      memory:     150Mi
    Requests:
      cpu:        100m
      memory:     150Mi
    Environment:  <none>

Limits are designed to be used in conjunction with requests. If you set a resource limit but omit its request—as in this example—then Kubernetes will automatically apply a request that's equal to your declared limit.

How Kubernetes Uses Limits

Kubernetes uses resource limits to identify pods that are depriving their neighbors of resources and could destabilize the cluster. Limits indicate that the pod must not be allowed to use more than a defined amount of a resource, but enforcement is applied differently for CPU and memory constraints:

  • Exceeding a CPU limit will not result in a pod being terminated, but it may result in it being throttled to preserve the performance of neighboring pods. Throttling will not occur if there is spare CPU capacity on the node.
  • Exceeding a memory limit results in the system's out-of-memory (OOM) killer intervening. This will kill the process that tried to allocate the additional memory. As there'll usually only be one process in a container, this normally results in the container terminating and being restarted.
  • Using limits allows you to set a hard cap on pod resource utilization. Correctly configured limits will prevent individual pods from destabilizing the cluster and their neighbors.

    Best Practices for Setting Kubernetes Requests and Limits

    Requests and limits require some finessing. Poorly chosen values can reduce your cluster's performance or cause waste. For example, if a request is set higher than your application actually requires, this would cause some cluster capacity to be permanently allocated to your pod even though it will never be used.

    Below are a few best practices to follow as you set your resource constraints.

    Don't Use CPU Limits

    CPU limits aren't required to guarantee a fair allocation of CPU time. Imposing a limit on a pod will throttle its CPU usage even when there are actually spare CPU cycles available. This can reduce performance.

    Instead, it's best to set a CPU request on each of your pods while omitting a limit. When resource contention occurs, the pods will be granted the proportion of available CPU time that matches the relative weight of their request. During periods of no contention, each pod will be able to access the CPU's full performance.

    The following configuration sets a request for 100 millicores, equivalent to 0.1 of a CPU core. In practice, the pod will be able to use more CPU capacity when it's available, but will always have a half share when compared to another running pod with a request of 200m or 0.2:

    resources:
      requests:
        cpu: 100m
    

    Always Set Equal Memory Requests and Limits

    Unlike CPU time, memory is a finite resource. While you can set different requests and limits for a pod, this is generally undesirable and can result in confusing behavior.

    You will usually want to allocate a specific amount of memory to a pod, then have it terminate when it tries to consume more. This matches the behavior of most other memory allocation systems outside Kubernetes.

    By setting equal memory requests and limits, you can ensure each pod has access to the memory it needs but will be terminated as soon as it uses more. Setting limits higher than requests exposes you to unexpected pod evictions if the cluster has to reclaim memory while the pod is exceeding its request.

    The following snippet guarantees that the pod can allocate 500 MiB of memory but will be prevented from consuming more:

    resources:
      requests:
        memory: 500Mi
      limits:
        memory: 500Mi
    

    Regularly Review and Rightsize Your Requests and Limits

    It's important to ensure your requests and limits are appropriate for your workload. To achieve this, you should first set the values to the highest amount you anticipate needing, then regularly tune them based on actual utilization.

    Rightsizing resource allocations to match your workload's requirements will ensure optimal performance and efficiency. Setting constraints too high means spare cluster capacity is left unused, whereas overly restrictive values reduce performance and compromise workload stability when pods prematurely terminate.

    You can monitor the actual resource usage of your pods by installing the Kubernetes Metrics Server in your cluster. This lets you use the kubectl top command to inspect your pods:

    # View pod resource consumption
    $ kubectl top pod
    NAME                               CPU(cores)   MEMORY(bytes)   
    cilium-nqxns                       54m          227Mi           
    cilium-operator-64d49bc6f9-shkvm   2m           30Mi            
    coredns-d767c746-2mktq             3m           35Mi
    

    Common Issues and Solutions When Managing Kubernetes Requests and Limits

    Kubernetes requests and limits are generally straightforward to administer once you understand how they interact. Nonetheless, there are a few possible gotchas that can cause problems as you set your values:

  • Incorrect requests/limits due to wrong units: Confusion around quantity units can cause pods to be incorrectly configured. CPU is expressed as millicores (100m) or a relative amount of a whole core (0.1), whereas memory is measured in bytes with an optional quantity suffix such as 400m (megabytes), 400Mi (mebibytes), or 1Gi (gibibytes). Getting these units wrong will produce unexpected outcomes.
  • Pods terminating before their memory limit is reached: Pods can sometimes terminate before their memory limit is reached. This happens when you set the pod's memory limit higher than its memory request. The pod may be evicted from its node if resource contention occurs while its memory use is higher than the requested amount.
  • Poor performance but low CPU load: Poorly performing workloads that show signs of underutilization of available CPU cycles are generally caused by incorrect use of CPU limits. You should avoid setting CPU limits so that pods are able to freely use spare capacity when it's available.
  • Setting overall resource limits for a team or application: Plain requests and limits work at the pod level, but you'll often need to set higher-level constraints as well. Resource quotas can be used to set overall resource usage constraints at the namespace level, allowing you to distribute resources between different team members, applications, and environments.
  • These tips will help you use requests and limits to reliably control cluster utilization.

    Conclusion

    In this article, you learned how to set Kubernetes requests and limits to precisely define CPU and memory usage constraints for your pods. Requests are used by the scheduling process to evaluate which nodes can accept a new pod, while limits are enforced during a pod's runtime to prevent excessive resource consumption.

    Properly setting Kubernetes requests and limits will keep your workloads performant and maintain your cluster's stability. They also help to ensure efficient cluster utilization, which reduces waste and prevents excessive costs. For these reasons, requests and limits should always be set on your pods, and most Kubernetes configuration scan tools require them to be present.

    Sign up for our newsletter

    Be the first to know about new features, announcements and industry insights.