Kubernetes Horizontal Pod Autoscaling

Levent Ogut
Minute Read

One of the most powerful features of Kubernetes is autoscaling, as it’s vital that we find the correct balance when scaling resources in our infrastructures. Scale up more than needed, and you will have unused resources which you must pay for. Scale down more than required and your application will not be performant.

Kubernetes brings three types of auto-scaling to the table:

  • Cluster Autoscaler
  • Horizontal Pod Scaler
  • Vertical Pod Scaler
  • The Cluster Autoscaler scales the nodes up/down depending on the pod's CPU and memory requests. If a pod cannot be scheduled due to the resource requests, then a node will be created to accommodate. On the other side, if nodes do not have any workloads running, they can be terminated.

    The Horizontal Pod Autoscaler scales the number of pods of an application based on the resource metrics such as CPU or memory usage or custom metrics. It can affect replication controllers, deployment, replica sets, or stateful sets. Custom metrics and external metrics are supported, so they can be used by another autoscaler within the cluster as well.

    The Vertical Pod Scaler is responsible for adjusting requests and limits on CPU and memory.

    Horizontal Pod Autoscaler API Versions

    API version autoscaling/v1 is the stable and default version; this version of API only supports CPU utilization-based autoscaling.

    autoscaling/v2beta2 version of the API brings usage of multiple metrics, custom and external metrics support.

    You can verify which API versions are supported on your cluster by querying the api-versions.

    $ kubectl api-versions | grep autoscaling
    

    An output similar to the following will be displayed. It will list all supported versions; in this case, we see that all three versions are supported.

    autoscaling/v1
    autoscaling/v2beta1
    autoscaling/v2beta2
    

    Requirements

    Horizontal Pod Autoscaler (and also Vertical Pod Autoscaler) requires a Metrics Server installed in the Kubernetes cluster. Metric Server is a container resource metrics (such as memory and CPU usage) source that is scalable, can be configured for high availability, and is efficient on resource usage when operating. Metrics Server gather metrics -by default- every 15 seconds from Kubelets, this allows rapid autoscaling,

    You can easily check if the metric server is installed or not by issuing the following command:

    $ kubectl top pods
    

    The following message will be shown if the metrics server is not installed.

    error: Metrics API not available
    

    On the other hand, if the Metric Server is installed, a similar output will be displayed for each pod in the namespace defined.

    NAME                                     CPU(cores)   MEMORY(bytes)
    metrics-server-7d9f89855d-l4rrz          7m           17Mi
    

    Installation of Metrics Server

    If you have already installed Metrics Server, you can skip this section.

    Metrics Server offers two easy installation mechanisms; one is using kubectl that includes all the manifests.

    $ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
    

    The second option is using the Helm chart, which is preferred. Helm values can be found here.

    First, add the Metrics-Server Helm repository to your local repository list as follows.

    helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
    

    Now you can install the Metrics Server via Helm.

    helm upgrade --install metrics-server metrics-server/metrics-server
    

    If you have a self-signed certificate, you should add --set args={--kubelet-insecure-tls} to the command above.

    You should see a similar output to the below:

    Release "metrics-server" does not exist. Installing it now.
    NAME: metrics-server
    LAST DEPLOYED: Wed Sep 22 16:16:55 2021
    NAMESPACE: default
    STATUS: deployed
    REVISION: 1
    TEST SUITE: None
    NOTES:
    ***********************************************************************
    * Metrics Server                                                      *
    ***********************************************************************
      Chart version: 3.5.0
      App version:   0.5.0
      Image tag:     k8s.gcr.io/metrics-server/metrics-server:v0.5.0
    

    Verifying the Installation

    As the installation is finished and we allow some time for the Metrics Server to get ready, let's try the command again.

    $ kubectl top pods
    
    NAME                                     CPU(cores)   MEMORY(bytes)
    metrics-server-7d9f89855d-l4rrz          7m           15Mi
    

    Also, we can see the resources of the nodes with a similar command.

    $ kubectl top nodes
    
    NAME             CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
    docker-desktop   370m         4%     2188Mi          57%
    

    You can also send queries directly to the Metric Server via kubectl.

    $ kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes | jq
    

    An output similar to below will be displayed.

    {
      "kind": "NodeMetricsList",
      "apiVersion": "metrics.k8s.io/v1beta1",
      "metadata": {},
      "items": [
        {
          "metadata": {
            "name": "docker-desktop",
            "creationTimestamp": "2021-10-04T12:33:01Z",
            "labels": {
              "beta.kubernetes.io/arch": "amd64",
              "beta.kubernetes.io/os": "linux",
              "kubernetes.io/arch": "amd64",
              "kubernetes.io/hostname": "docker-desktop",
              "kubernetes.io/os": "linux",
              "node-role.kubernetes.io/master": ""
            }
          },
          "timestamp": "2021-10-04T12:32:07Z",
          "window": "1m0s",
          "usage": {
            "cpu": "380139514n",
            "memory": "2077184Ki"
          }
        }
      ]
    }
    

    We can also verify our pod's metrics from the API.

    $ kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/default/pods/web-servers-65c7fc644d-5h6mb | jq
    
    {
      "kind": "PodMetrics",
      "apiVersion": "metrics.k8s.io/v1beta1",
      "metadata": {
        "name": "web-servers-65c7fc644d-5h6mb",
        "namespace": "default",
        "creationTimestamp": "2021-10-04T12:36:48Z",
        "labels": {
          "app": "web-servers",
          "pod-template-hash": "65c7fc644d"
        }
      },
      "timestamp": "2021-10-04T12:35:55Z",
      "window": "54s",
      "containers": [
        {
          "name": "nginx",
          "usage": {
            "cpu": "0",
            "memory": "6860Ki"
          }
        }
      ]
    }
    

    You might come across a situation similar to the following, where metric-server cannot get the current CPU usage of the containers in the pod.

    $ kubectl get hpa
    
    NAME          REFERENCE                TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
    web-servers   Deployment/web-servers   <unknown>/20%   1         10        1          8m6s
    

    This is either the Metric Server control loop that hasn't run yet, is not running correctly, or resource requests are not set on the target pod spec.

    Configuring Horizontal Pod AutoScaling

    As we have two API versions of this object, it would be good to examine both; however, autoscaling/v2beta2 is the recommended version to use at the time of writing.

    Let's create a simple deployment first; we will be using the Nginx image.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: web-servers
      labels:
        app: web-servers
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: web-servers
      template:
        metadata:
          labels:
            app: web-servers
        spec:
          containers:
          - name: nginx
            image: nginx
            ports:
            - containerPort: 80
            resources:
              limits:
                cpu: 100m
              requests:
                cpu: 50m
    

    Let's create a service.

    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app: web-servers
      name: web-servers
      namespace: default
    spec:
      ports:
      - name: web-servers-port
        port: 80
      selector:
        app: web-servers
      sessionAffinity: None
      type: NodePort
    

    At this point, you need to choose which API version you would use; we will show examples for both.

    autoscaling/v1 API Version

    Lastly, let's configure our HorizontalPodAutoscaler matching web-servers deployment in autoscaling/v1 API version for those that choose.

    apiVersion: autoscaling/v1
    kind: HorizontalPodAutoscaler
    metadata:
      name: web-servers-v1
    spec:
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: web-servers
      minReplicas: 1
      maxReplicas: 10
      targetCPUUtilizationPercentage: 20
    

    autoscaling/v2beta2 API Version

    Here we have the newer version of the API where we can use multiple metrics. In our example, we defined two metrics, one for CPU and the other is memory usage.

    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    metadata:
      name: web-servers
    spec:
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: web-servers
      minReplicas: 1
      maxReplicas: 10
      metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 20
      - type: Resource
        resource:
          name: memory
          target:
            type: AverageValue
            averageValue: 30Mi
    

    Let's check the HPA entries.

    $ kubectl get hpa
    
    NAME          REFERENCE                TARGETS                MINPODS   MAXPODS   REPLICAS   AGE
    web-servers   Deployment/web-servers   6930432/30Mi, 0%/20%   1         10        1          10d
    

    We can also use the describe subcommand to gather more information.

    $ kubectl describe hpa web-servers
    Name:                                                  web-servers
    Namespace:                                             default
    Labels:                                                <none>
    Annotations:                                           <none>
    CreationTimestamp:                                     Mon, 04 Oct 2021 15:39:00 +0300
    Reference:                                             Deployment/web-servers
    Metrics:                                               ( current / target )
      resource memory on pods:                             6930432 / 30Mi
      resource cpu on pods  (as a percentage of request):  0% (0) / 20%
    Min replicas:                                          1
    Max replicas:                                          10
    Deployment pods:                                       1 current / 1 desired
    Conditions:
      Type            Status  Reason              Message--------
      AbleToScale     True    ReadyForNewScale    recommended size matches current size
      ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from memory resource
      ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
    Events:           <none>
    

    Operation of Horizontal Pod AutoScaling

    Let's create a load of web traffic destined to our web-servers service and examine the effect. For load, we will use Hey, a tiny web load generator. You can use a bash script with curl/wget commands if you prefer.

    First, let's port-forward the service that we had created for web-servers pods.

    $ kubectl port-forward svc/web-servers 8080:80
    

    Run the hey command from your local shell with -n 2000, meaning it should send 10000 requests with five workers concurrently.

    $ hey -n 10000 -c 5 http://localhost:8080/
    

    To see the effects of the load, let's check the HPA entry.

    $ kubectl get hpa web-servers
    

    At this point, we can see that CPU and memory usage has dramatically increased.

    NAME          REFERENCE                TARGETS                  MINPODS   MAXPODS   REPLICAS   AGE
    web-servers   Deployment/web-servers   20049920/30Mi, 48%/20%   1         10        1          14d
    

    After a short delay, Horizontal Pod Autoscaler gets the new metrics for the pod and calculates the number of replicas it needs for upscale/downscale.

    $ kubectl get hpa web-servers
    

    Autoscaling is in effect; a total of 10 replicas were created.

    NAME          REFERENCE                TARGETS                     MINPODS   MAXPODS   REPLICAS   AGE
    web-servers   Deployment/web-servers   9233066666m/30Mi, 66%/20%   1         10        10         11d
    

    We can take a more detailed look using the describe subcommand.

    $ kubectl describe hpa web-servers
    

    Conditions and events fields are crucial for troubleshooting and understanding the behavior of the HPA.

    Name:                                                  web-servers
    Namespace:                                             default
    Labels:                                                <none>
    Annotations:                                           <none>
    CreationTimestamp:                                     Mon, 04 Oct 2021 15:39:00 +0300
    Reference:                                             Deployment/web-servers
    Metrics:                                               ( current / target )
      resource memory on pods:                             9233066666m / 30Mi
      resource cpu on pods  (as a percentage of request):  66% (33m) / 20%
    Min replicas:                                          1
    Max replicas:                                          10
    Deployment pods:                                       10 current / 10 desired
    Conditions:
      Type            Status  Reason               Message--------
      AbleToScale     True    ScaleDownStabilized  recent recommendations were higher than current one, applying the highest recent recommendation
      ScalingActive   True    ValidMetricFound     the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
      ScalingLimited  True    TooManyReplicas      the desired replica count is more than the maximum replica count
    Events:
      Type    Reason             Age   From                       Message----------------
      Normal  SuccessfulRescale  4m1s  horizontal-pod-autoscaler  New size: 3; reason: cpu resource utilization (percentage of request) above target
      Normal  SuccessfulRescale  3m1s  horizontal-pod-autoscaler  New size: 6; reason: cpu resource utilization (percentage of request) above target
      Normal  SuccessfulRescale  2m    horizontal-pod-autoscaler  New size: 10; reason: cpu resource utilization (percentage of request) above target
    

    Also, we can check the deployment object to see events and several other fields related to autoscaling.

    $ kubectl describe deployments web-servers
    
    Name:                   web-servers
    Namespace:              default
    CreationTimestamp:      Mon, 04 Oct 2021 15:43:14 +0300
    Labels:                 app=web-servers
    Annotations:            deployment.kubernetes.io/revision: 3
    Selector:               app=web-servers
    Replicas:               10 desired | 10 updated | 10 total | 10 available | 0 unavailable
    StrategyType:           RollingUpdate
    MinReadySeconds:        0
    RollingUpdateStrategy:  25% max unavailable, 25% max surge
    Pod Template:
      Labels:  app=web-servers
      Containers:
       nginx:
        Image:      nginx
        Port:       80/TCP
        Host Port:  0/TCP
        Limits:
          cpu:  100m
        Requests:
          cpu:        50m
        Environment:  <none>
        Mounts:       <none>
      Volumes:        <none>
    Conditions:
      Type           Status  Reason-------Progressing    True    NewReplicaSetAvailable
      Available      True    MinimumReplicasAvailable
    OldReplicaSets:  <none>
    NewReplicaSet:   web-servers-77cbb55d6 (10/10 replicas created)
    Events:
      Type    Reason             Age    From                   Message----------------
      Normal  ScalingReplicaSet  4m50s  deployment-controller  Scaled up replica set web-servers-77cbb55d6 to 3
      Normal  ScalingReplicaSet  3m50s  deployment-controller  Scaled up replica set web-servers-77cbb55d6 to 6
      Normal  ScalingReplicaSet  2m49s  deployment-controller  Scaled up replica set web-servers-77cbb55d6 to 10
    

    Here are all the replicas created.

    $ kubectl get pods
    
    NAME                                     READY   STATUS    RESTARTS   AGE
    metrics-server-7d9f89855d-l4rrz          1/1     Running   13         23d
    web-servers-77cbb55d6-2vrn5              1/1     Running   0          3m30s
    web-servers-77cbb55d6-7ps7k              1/1     Running   0          5m31s
    web-servers-77cbb55d6-8brrm              1/1     Running   0          4m31s
    web-servers-77cbb55d6-gsrk8              1/1     Running   0          4m31s
    web-servers-77cbb55d6-jwshp              1/1     Running   0          11d
    web-servers-77cbb55d6-qg9fz              1/1     Running   0          3m30s
    web-servers-77cbb55d6-ttjz2              1/1     Running   0          3m30s
    web-servers-77cbb55d6-wzbwt              1/1     Running   0          5m31s
    web-servers-77cbb55d6-xxf7q              1/1     Running   0          3m30s
    web-servers-77cbb55d6-zxglt              1/1     Running   0          4m31s
    

    Conclusion

    We have seen how to configure HPA using the old and the new version. With the capability of using multiple metrics, we can develop more complex strategies. Using the custom metric option, we can port the application-specific instrumentation and use it for the scaling.

    After the configuration, we had a quick demo of an HPA configuration and observed the commands to review the metrics and events.

    Horizontal Pod Scaling allows us to scale our applications based on different metrics. By scaling to the correct number of pods dynamically, we can serve our application in a performant and cost-efficient manner.

    Further Reading

    Photo by Rafael Leão on Unsplash

    Sign up for our newsletter

    Be the first to know about new features, announcements and industry insights.