What Does It Mean to Scale a Deployment?

Dawid Ziolkowski
Minute Read

Kubernetes can do a lot of things for you. It can manage secrets, create load balancers, create and destroy cloud storages for your pods, and much more.

But at its core, the very point of Kubernetes is to manage deployments. I say "manage" because it's not only about running the deployments but also constantly checking if they're still running and are according to the specification.

Kubernetes is really good at that. It can automatically detect which node in your cluster is the best to run a deployment, and in case of a node failure, it'll automatically respawn your pod on another node.

Another feature of Kubernetes when it comes to deployments is scaling. It's possible to scale deployment up and down manually or automatically depending on different factors.

In this post, you'll learn all about how to scale Kubernetes deployments with Kubectl scale deployment.

What Does It Mean to Scale a Deployment?

Let's get right on it. I assume you're here because you already have some deployment running on your Kubernetes cluster and want to scale it. What does it mean to "scale a deployment" on Kubernetes? It means changing the number of replicas for a deployment.

As you probably already know, when you create a deployment in Kubernetes, one of the parameters that you can specify in a spec section of a Kubernetes definition file is replicas. If you don't, by default, it'll take a value of 1. This means you'll get only one pod of your deployment running.

And while in some cases that's OK, in general, one of the main points of Kubernetes is to run highly available microservices. Therefore, you usually want to run more than one pod.

So, let's assume that you created a deployment without specifying replicas count, and so you have only one running pod for your deployment. You want to make your pod more resilient to node failures. Therefore, you decide to run four pods for that deployment instead.

You could simply find the YAML definition of your deployment, edit it to add a replicas parameter with a value of 4, and reapply the deployment from a file. But you can also change the replicas count for a running deployment directly on the cluster. That's where scaling a deployment comes into play.

Kubectl Scale

The easiest way to scale deployment is to use the kubectl scale command followed by a deployment you want to scale and desired replicas count after the --replicas parameter. So, for example, if you want to scale a deployment called "example-app," you can execute:

kubectl scale deployment/example-app --replicas=4

That's it. Now your deployment will be scaled up from one to four replicas. You can execute that command as many times as you'd like. Kubernetes will create more pods every time or delete some pods if you decide to scale down the deployment. Let's see that in action.

Our deployment is running now with four replicas, as per our scale action above.

$ kubectl get deployments
example-app       4/4       4         4     54s

If we now execute the following command

kubectl scale deployment/example-app --replicas=2

Kubernetes will delete two pods and, therefore, scale down our deployment.

$ kubectl scale deployment/example-app --replicas=2
deployment.apps/example-app scaled
$ kubectl get deployments
example-app           2/2         2         2     2m15s

Scaling to Zero

Guess what happens if you try to scale deployment to zero replicas. You'll see an error message? No, actually. Kubernetes will just patiently execute your request and delete all the pods for your deployment.

$ kubectl scale deployment/example-app --replicas=0
deployment.apps/example-app scaled
$ kubectl get deployments
example-app           0/0         0         0     3m24s

What does this mean? Scaling deployment to zero effectively means disabling the deployment. It's disabling, not deleting, because the deployment object will still exist, as you can see in the code snippet above. It just doesn't have any running pods.

In fact, it's a common approach if you temporarily don't need that deployment. If you know that you'll need that deployment again soon, instead of deleting it and applying again later, you can scale it to zero and then scale it back to the desired amount later.

Not only is this approach a little bit faster than deleting and re-creating the whole deployment, but it also gives you the added benefit of easily seeing the history of changes to the deployment. If you execute the following,

kubectl describe deployment example-app

at the bottom of the output, you'll see the list of all previous scaling actions.


| Type   | Reason            | Age   | From                  | Message                                            |
| Normal | ScalingReplicaSet | 11m   | deployment-controller | Scaled up replica set example-app-9456bbbf9 to 1   |
| Normal | ScalingReplicaSet | 10m   | deployment-controller | Scaled up replica set example-app-9456bbbf9 to 4   |
| Normal | ScalingReplicaSet | 9m7s  | deployment-controller | Scaled down replica set example-app-9456bbbf9 to 2 |
| Normal | ScalingReplicaSet | 7m58s | deployment-controller | Scaled down replica set example-app-9456bbbf9 to 0 |

Scaling to 10,000

We know now what will happen if you scale to zero. Now, let's go the other way.

What happens if you try to scale to a very large number?

Well, it depends. If you have a very large cluster and the desired number of replicas can all fit on the nodes, then Kubernetes will simply scale as requested. However, if you have a small cluster and you try to scale, let's say, to 10,000 replicas, one of two things will happen.

Most likely, your scaling action will just stop at a certain number of pods.

[dave@localhost scale] $ kubectl get deployments
NAME             READY     UP-TO-DATE     AVAILABLE     AGE
example-app          954/10000        954        954    4m42s

Kubernetes simply won't be able to fit more pods on your cluster, so it'll deploy as many as possible and then wait for more resources. This usually means either some other deployment being scaled down or adding more nodes to the cluster.

In a less optimistic but very possible scenario, your cluster will become very slow and eventually just overload itself and stop responding.

$ kubectl get deployments

Unable to connect to the server: net/http: TLS handshake timeout

This can happen for the same reason as explained above. Your cluster will keep trying to deploy 10,000 pods but it can't, so it'll start constantly sending messages to nodes to check their available resources, and if the number of replicas you requested is way too high, it can lead to a DoS situation.


Automated Scaling

Now you know how to scale a deployment manually. Scaling deployments manually using the kubectl scale can be useful, but you don't want to manually scale the deployment every time there is, for example, a spike in traffic on your application.

In most situations, automated scaling can do the job for you. Kubernetes comes with something called HorizontalPodAutoscaler. As the name suggests, it's a construct that can automatically scale your pods horizontally.

Horizontally means deploying more pods. There's also vertical scaling, which would assign more CPU and memory quota for pods instead.

So, how does it work? You set up HorizontalPodAutoscaler once, and it'll then monitor your deployment and scale it up or down based on some metric. That metric most commonly will be a CPU usage of a pod. However, you can also instruct HorizontalPodAutoscaler to monitor custom metrics from your application. To create a HorizontalPodAutoscaler, you can use the kubectl autoscale command.

Let's see an example.

$ kubectl autoscale deployment example-app --cpu-percent=80 --min=1 --max=10
horizontalpodautoscaler.autoscaling/example-app autoscaled

In the command above, we instructed Kubernetes to monitor the CPU usage of our example-app deployment and to create new pods every time the average CPU usage for our deployment rises above 80%. You can get details of HorizontalPodAutoscaler by executing kubectl get hpa.

$ kubectl get hpa
example-app Deployment/example-app         2%/80%         1     10          1        3m3s

You can see that we currently have very low CPU usage for our deployment. Therefore, there's only one replica running. But if the load increases, our HPA should detect that.

$ kubectl get hpa
example-app Deployment/example-app     98%/80%       1      10         1      3m42s

And shortly after that, it should increase the number of pods to bring down the average CPU usage back below 80%.

$ kubectl get hpa
example-app Deployment/example-app     53%/80%         1      10          4      4m11s

It's as simple as that.


Scaling Kubernetes deployment is a very common and useful thing. When using autoscaling, it can also be very powerful, making your application much more resilient to load spikes.

In this post, you learned how to scale Kubernetes deployment manually using kubectl scale deployment and how to configure HorizontalPodAutoscaler. Both of these methods have their use cases, and both have their own limitations.

Kubectl scale, when not used carefully, can even make your cluster unresponsive. HorizontalPodAutoscalers are most useful when used with custom metrics but can also surprise you with high cloud usage bills if your autoscaling parameters aren't properly adjusted.

If you want to learn more about Kubernetes, check out our blog here.

This post was written by Dawid Ziolkowski. Dawid has 10 years of experience as a Network/System Engineer at the beginning, DevOps in between, Cloud Native Engineer recently. He’s worked for an IT outsourcing company, a research institute, telco, a hosting company, and a consultancy company, so he’s gathered a lot of knowledge from different perspectives. Nowadays he’s helping companies move to cloud and/or redesign their infrastructure for a more Cloud Native approach.

Sign up for our newsletter

Be the first to know about new features, announcements and industry insights.