Table of Contents
One of the most powerful features of Kubernetes is autoscaling, as it’s vital that we find the correct balance when scaling resources in our infrastructures. Scale up more than needed, and you will have unused resources which you must pay for. Scale down more than required and your application will not be performant.
Kubernetes brings three types of auto-scaling to the table:
- Cluster Autoscaler
- Horizontal Pod Scaler
- Vertical Pod Scaler
The Cluster Autoscaler scales the nodes up/down depending on the pod's CPU and memory requests. If a pod cannot be scheduled due to the resource requests, then a node will be created to accommodate. On the other side, if nodes do not have any workloads running, they can be terminated.
The Horizontal Pod Autoscaler scales the number of pods of an application based on the resource metrics such as CPU or memory usage or custom metrics. It can affect replication controllers, deployment, replica sets, or stateful sets. Custom metrics and external metrics are supported, so they can be used by another autoscaler within the cluster as well.
The Vertical Pod Scaler is responsible for adjusting requests and limits on CPU and memory.
Horizontal Pod Autoscaler API Versions
API version autoscaling/v1
is the stable and default version; this version of API only supports CPU utilization-based autoscaling.
autoscaling/v2beta2
version of the API brings usage of multiple metrics, custom and external metrics support.
You can verify which API versions are supported on your cluster by querying the api-versions
.
$ kubectl api-versions | grep autoscaling
An output similar to the following will be displayed. It will list all supported versions; in this case, we see that all three versions are supported.
autoscaling/v1
autoscaling/v2beta1
autoscaling/v2beta2
Requirements
Horizontal Pod Autoscaler (and also Vertical Pod Autoscaler) requires a Metrics Server installed in the Kubernetes cluster. Metric Server is a container resource metrics (such as memory and CPU usage) source that is scalable, can be configured for high availability, and is efficient on resource usage when operating. Metrics Server gather metrics -by default- every 15 seconds from Kubelets, this allows rapid autoscaling,
You can easily check if the metric server is installed or not by issuing the following command:
$ kubectl top pods
The following message will be shown if the metrics server is not installed.
error: Metrics API not available
On the other hand, if the Metric Server is installed, a similar output will be displayed for each pod in the namespace defined.
NAME CPU(cores) MEMORY(bytes)
metrics-server-7d9f89855d-l4rrz 7m 17Mi
Installation of Metrics Server
If you have already installed Metrics Server, you can skip this section.
Metrics Server offers two easy installation mechanisms; one is using kubectl
that includes all the manifests.
$ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
The second option is using the Helm chart, which is preferred. Helm values can be found here.
First, add the Metrics-Server Helm repository to your local repository list as follows.
helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
Now you can install the Metrics Server via Helm.
helm upgrade --install metrics-server metrics-server/metrics-server
If you have a self-signed certificate, you should add --set args={--kubelet-insecure-tls}
to the command above.
You should see a similar output to the below:
Release "metrics-server" does not exist. Installing it now.
NAME: metrics-server
LAST DEPLOYED: Wed Sep 22 16:16:55 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
***********************************************************************
* Metrics Server *
***********************************************************************
Chart version: 3.5.0
App version: 0.5.0
Image tag: k8s.gcr.io/metrics-server/metrics-server:v0.5.0
Verifying the Installation
As the installation is finished and we allow some time for the Metrics Server to get ready, let's try the command again.
$ kubectl top pods
NAME CPU(cores) MEMORY(bytes)
metrics-server-7d9f89855d-l4rrz 7m 15Mi
Also, we can see the resources of the nodes with a similar command.
$ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
docker-desktop 370m 4% 2188Mi 57%
You can also send queries directly to the Metric Server via kubectl
.
$ kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes | jq
An output similar to below will be displayed.
{
"kind": "NodeMetricsList",
"apiVersion": "metrics.k8s.io/v1beta1",
"metadata": {},
"items": [
{
"metadata": {
"name": "docker-desktop",
"creationTimestamp": "2021-10-04T12:33:01Z",
"labels": {
"beta.kubernetes.io/arch": "amd64",
"beta.kubernetes.io/os": "linux",
"kubernetes.io/arch": "amd64",
"kubernetes.io/hostname": "docker-desktop",
"kubernetes.io/os": "linux",
"node-role.kubernetes.io/master": ""
}
},
"timestamp": "2021-10-04T12:32:07Z",
"window": "1m0s",
"usage": {
"cpu": "380139514n",
"memory": "2077184Ki"
}
}
]
}
We can also verify our pod's metrics from the API.
$ kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/default/pods/web-servers-65c7fc644d-5h6mb | jq
{
"kind": "PodMetrics",
"apiVersion": "metrics.k8s.io/v1beta1",
"metadata": {
"name": "web-servers-65c7fc644d-5h6mb",
"namespace": "default",
"creationTimestamp": "2021-10-04T12:36:48Z",
"labels": {
"app": "web-servers",
"pod-template-hash": "65c7fc644d"
}
},
"timestamp": "2021-10-04T12:35:55Z",
"window": "54s",
"containers": [
{
"name": "nginx",
"usage": {
"cpu": "0",
"memory": "6860Ki"
}
}
]
}
You might come across a situation similar to the following, where metric-server cannot get the current CPU usage of the containers in the pod.
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
web-servers Deployment/web-servers <unknown>/20% 1 10 1 8m6s
This is either the Metric Server control loop that hasn't run yet, is not running correctly, or resource requests are not set on the target pod spec.
Configuring Horizontal Pod AutoScaling
As we have two API versions of this object, it would be good to examine both; however, autoscaling/v2beta2
is the recommended version to use at the time of writing.
Let's create a simple deployment first; we will be using the Nginx image.
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-servers
labels:
app: web-servers
spec:
replicas: 1
selector:
matchLabels:
app: web-servers
template:
metadata:
labels:
app: web-servers
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
resources:
limits:
cpu: 100m
requests:
cpu: 50m
Let's create a service.
apiVersion: v1
kind: Service
metadata:
labels:
app: web-servers
name: web-servers
namespace: default
spec:
ports:
- name: web-servers-port
port: 80
selector:
app: web-servers
sessionAffinity: None
type: NodePort
At this point, you need to choose which API version you would use; we will show examples for both.
autoscaling/v1 API Version
Lastly, let's configure our HorizontalPodAutoscaler matching web-servers
deployment in autoscaling/v1
API version for those that choose.
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: web-servers-v1
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-servers
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 20
autoscaling/v2beta2 API Version
Here we have the newer version of the API where we can use multiple metrics. In our example, we defined two metrics, one for CPU and the other is memory usage.
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: web-servers
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-servers
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 20
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: 30Mi
Let's check the HPA entries.
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
web-servers Deployment/web-servers 6930432/30Mi, 0%/20% 1 10 1 10d
We can also use the describe
subcommand to gather more information.
$ kubectl describe hpa web-servers
Name: web-servers
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Mon, 04 Oct 2021 15:39:00 +0300
Reference: Deployment/web-servers
Metrics: ( current / target )
resource memory on pods: 6930432 / 30Mi
resource cpu on pods (as a percentage of request): 0% (0) / 20%
Min replicas: 1
Max replicas: 10
Deployment pods: 1 current / 1 desired
Conditions:
Type Status Reason Message--------
AbleToScale True ReadyForNewScale recommended size matches current size
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from memory resource
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events: <none>
Operation of Horizontal Pod AutoScaling
Let's create a load of web traffic destined to our web-servers service and examine the effect. For load, we will use Hey, a tiny web load generator. You can use a bash script with curl/wget commands if you prefer.
First, let's port-forward the service that we had created for web-servers pods.
$ kubectl port-forward svc/web-servers 8080:80
Run the hey
command from your local shell with -n 2000, meaning it should send 10000 requests with five workers concurrently.
$ hey -n 10000 -c 5 http://localhost:8080/
To see the effects of the load, let's check the HPA entry.
$ kubectl get hpa web-servers
At this point, we can see that CPU and memory usage has dramatically increased.
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
web-servers Deployment/web-servers 20049920/30Mi, 48%/20% 1 10 1 14d
After a short delay, Horizontal Pod Autoscaler gets the new metrics for the pod and calculates the number of replicas it needs for upscale/downscale.
$ kubectl get hpa web-servers
Autoscaling is in effect; a total of 10 replicas were created.
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
web-servers Deployment/web-servers 9233066666m/30Mi, 66%/20% 1 10 10 11d
We can take a more detailed look using the describe
subcommand.
$ kubectl describe hpa web-servers
Conditions and events fields are crucial for troubleshooting and understanding the behavior of the HPA.
Name: web-servers
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Mon, 04 Oct 2021 15:39:00 +0300
Reference: Deployment/web-servers
Metrics: ( current / target )
resource memory on pods: 9233066666m / 30Mi
resource cpu on pods (as a percentage of request): 66% (33m) / 20%
Min replicas: 1
Max replicas: 10
Deployment pods: 10 current / 10 desired
Conditions:
Type Status Reason Message--------
AbleToScale True ScaleDownStabilized recent recommendations were higher than current one, applying the highest recent recommendation
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited True TooManyReplicas the desired replica count is more than the maximum replica count
Events:
Type Reason Age From Message----------------
Normal SuccessfulRescale 4m1s horizontal-pod-autoscaler New size: 3; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 3m1s horizontal-pod-autoscaler New size: 6; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 2m horizontal-pod-autoscaler New size: 10; reason: cpu resource utilization (percentage of request) above target
Also, we can check the deployment object to see events and several other fields related to autoscaling.
$ kubectl describe deployments web-servers
Name: web-servers
Namespace: default
CreationTimestamp: Mon, 04 Oct 2021 15:43:14 +0300
Labels: app=web-servers
Annotations: deployment.kubernetes.io/revision: 3
Selector: app=web-servers
Replicas: 10 desired | 10 updated | 10 total | 10 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=web-servers
Containers:
nginx:
Image: nginx
Port: 80/TCP
Host Port: 0/TCP
Limits:
cpu: 100m
Requests:
cpu: 50m
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason-------Progressing True NewReplicaSetAvailable
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: web-servers-77cbb55d6 (10/10 replicas created)
Events:
Type Reason Age From Message----------------
Normal ScalingReplicaSet 4m50s deployment-controller Scaled up replica set web-servers-77cbb55d6 to 3
Normal ScalingReplicaSet 3m50s deployment-controller Scaled up replica set web-servers-77cbb55d6 to 6
Normal ScalingReplicaSet 2m49s deployment-controller Scaled up replica set web-servers-77cbb55d6 to 10
Here are all the replicas created.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
metrics-server-7d9f89855d-l4rrz 1/1 Running 13 23d
web-servers-77cbb55d6-2vrn5 1/1 Running 0 3m30s
web-servers-77cbb55d6-7ps7k 1/1 Running 0 5m31s
web-servers-77cbb55d6-8brrm 1/1 Running 0 4m31s
web-servers-77cbb55d6-gsrk8 1/1 Running 0 4m31s
web-servers-77cbb55d6-jwshp 1/1 Running 0 11d
web-servers-77cbb55d6-qg9fz 1/1 Running 0 3m30s
web-servers-77cbb55d6-ttjz2 1/1 Running 0 3m30s
web-servers-77cbb55d6-wzbwt 1/1 Running 0 5m31s
web-servers-77cbb55d6-xxf7q 1/1 Running 0 3m30s
web-servers-77cbb55d6-zxglt 1/1 Running 0 4m31s
Conclusion
We have seen how to configure HPA using the old and the new version. With the capability of using multiple metrics, we can develop more complex strategies. Using the custom metric option, we can port the application-specific instrumentation and use it for the scaling.
After the configuration, we had a quick demo of an HPA configuration and observed the commands to review the metrics and events.
Horizontal Pod Scaling allows us to scale our applications based on different metrics. By scaling to the correct number of pods dynamically, we can serve our application in a performant and cost-efficient manner.
Further Reading
Photo by Rafael Leão on Unsplash