Multi-tenancy in Kubernetes: Comparing Isolation and Costs

Daniele Polencic

March 13, 2024

9 Minute Read

This is some text inside of a div block.

Get Started Now

Having multiple tenants sharing a Kubernetes cluster makes sense from a cost perspective, but what’s the overhead?

How much should you invest to keep the tenant isolated, and how does it compare to running several clusters?

Before examining the costs, let’s look at the scale of the problem.

Most teams partition their cluster by environments.

For example, ten teams might have three environments each (i.e. dev, test and prod).

If you partition the cluster by environment and team, you will have 30 distinct slices.

What happens when you scale to 50 teams?

You will end up with 150 slices, of course.

But what are the consequences of this decision?

Imagine you want to deploy an Ingress controller to manage incoming traffic.

You have two main choices:

You can deploy a single Ingress for all tenants to use.
You can have a dedicated Ingress controller per tenant.

A single ingress controller VS dedicated ingress controllers

Let’s also assume that you wish to use the nginx-ingress controller and use the default request of 100 millicores and 90MB.

If you decide to deploy a dedicated Ingress controller per tenant, you will end up with fifty times those values:

50 x 100 millicores = 5vCPU
50 x 90MB = 4.5GB

The closest EC2 instances that matches those specs is a c6i.2xlarge priced at ~$250/m.

If you are happy to share a single Ingress controller for 50 tenants, your costs are only a fraction since you will only pay for 100 millicores and 90MB.

Is this realistic, though?

The traffic ingested by 50 tenants will likely need more than a single ingress controller and, on average, might consume more than the requests of 100mi and 90MB.

And then, you need to consider the scenario where something breaks (or needs upgrading): all tenants are affected.

In other words, isolating tenants in a cluster has a cost related to the level of isolation you wish to work with.

Since Kubernetes is designed for soft multi-tenancy, it’s worth investigating different configurations for multi-tenancy, from soft to hard.

We decided to run an experiment and compare costs for three configurations with increasingly stronger levels of isolations:

Hierarchical Namespace controller for soft multi-tenancy.
vCluster for isolating control planes.
Karmada for managing a cluster per tenant (hard multi-tenancy).

Let’s start with the Hierarchical Namespace controller.

Soft multi-tenancy with the Hierarchical Namespace Controller

The Hierarchical Namespace Controller is a component you install in the cluster that lets you nest namespaces.

The (clever) idea behind it is that all child namespaces inherit resources from the parent and can be infinitely nested.

Hierarchical Namespace Controller nested namespaces

So, if you create a Role in the parent namespace, the same resource will be made available to its children.

Under the hood, the controller computes the difference between the two namespaces and copies the resources.

Let’s look at an example.

After you installed the controller, you can create a root namespace with the following command:

$ kubectl create ns parent

You can create a role in the parent namespace with:

$ kubectl -n parent create role test1 --verb=* \ --resource=pod

Now, let’s write a script to generate 50 child namespaces:

#!/bin/bash

for i in {1..50}
do
  kubectl hns create "tenant-$i" -n parent
done

If you list the roles in any of the child namespace, you will notice that the role was propagated:

$ kubectl get roles -n tenant-1
NAME            CREATED AT
test1           2024-02-29T20:16:05Z

You can also list the relationship between namespaces and their tree structure:

$ kubectl hns tree parent 
parent 
├── [s] tenant-1 
├── [s] tenant-10 
├── [s] tenant-11 
├── [s] tenant-12 
├── [s] tenant-13 
├── [s] tenant-14 
├── [s] tenant-15 
├── [s] tenant-16
# truncated output

But how are nested namespaces implemented?

Child namespaces are just regular Kubernetes namespaces, and you can verify that by listing them:

$ kubectl get namespaces
NAME              STATUS
default           Active
hnc-system        Active
kube-node-lease   Active
kube-public       Active
kube-system       Active
parent            Active
tenant-1          Active
tenant-10         Active
tenant-11         Active
tenant-12         Active
tenant-13         Active
# truncated

Their relationships are stored in the Hierarchical Namespace Controller, which is also responsible for propagating responses from the parent namespaces to its children.

The cost of running such an operator is low: the current requests for memory and CPU are 300Mb and 100 millicores.

But this has a few limitations.

In Kubernetes, resources such as Pod and Deployments can be deployed in a namespace.

However, some resources are global to the cluster, such as ClusterRoles, ClusterRoleBindings, Namespaces, PersistentVolumes, Custom Resource Definitions (CRDs), etc.

If tenants can manage Persistent Volumes, they can see all persistent volumes in the cluster, not just theirs.

Those global resources are stored in the control plane, so what if you could have a control plane per tenant instead?

vCluster: the cost of isolated control planes

You could provide a cluster per tenant or take a lightweight approach: run a control plane as a pod in your cluster.

Tenants connect directly to the control plane in the pod and create their resources there.

Since the control plane is just for them, you immediately remove any global resource and contention issues.

But where is the pod scheduled if you run just a control plane?

vCluster took this approach and devised an ingenious solution: a controller that copies resources from the tenant’s control plane to the host control plane.

When you schedule a Deployment in the nested control plane, the resulting pod specs are copied to the host control plane, where they are assigned and deployed to actual nodes.

vCluster has some interesting trade-offs:

Each tenant has an entire control plane and the flexibility of a real Kubernetes cluster.
This control plane is only used to store resources in a database.
The controller can be instructed to copy only specific resources.

In other words, a careful synching mechanism lets you selectively decide how to propagate resources from the tenant cluster.

Let’s test this by creating a nested control plane with:

vcluster create test --set 'sync.persistentvolumes.enabled=true'

Once the nested cluster is ready, you can save the persistent volume as pv.yaml:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: task-pv-volume
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/mnt/data"

And submit it to the cluster with:

$ kubectl apply -f pv.yaml
persistentvolume/task-pv-volume created

Let’s disconnect from the tenant and list all Persistent Volume in the cluster:

$ vcluster disconnect
$ kubectl get pv
NAME                                             CAPACITY   STATUS      CLAIM
pvc-6ced7d97-c0f4-4a82-a5f8-2337907fff0b         5Gi        Bound       vcluster-test/data-test-0
vcluster-task-pv-volume-x-vcluster-test-x-test   10Gi       Available

There are two Persistent Volumes: one for the control plane and the other you just created.

What happens when we repeat the experiment with a second tenant?

$ vcluster create test2 --set 'sync.persistentvolumes.enabled=true'

You can apply the same pv.yaml as before (notice how the name should clash, but it doesn’t):

$ kubectl apply -f pv.yaml
persistentvolume/task-pv-volume created

Let’s disconnect and inspect the host cluster:

$ vcluster disconnect
$ kubectl get pv
NAME                                               CAPACITY   STATUS      CLAIM
pvc-131f1b41-7ed3-4175-a33d-080cdff41b44           5Gi        Bound       vcluster-test2/data-test2-0
pvc-6ced7d97-c0f4-4a82-a5f8-2337907fff0b           5Gi        Bound       vcluster-test/data-test-0
vcluster-task-pv-volume-x-vcluster-test-x-test     10Gi       Available
vcluster-task-pv-volume-x-vcluster-test2-x-test2   10Gi       Available

Each tenant can only see a single Persistent Volume, but the host cluster can list all of them!

Notice how, for each tenant, you have a pod and a persistent volume.

How many resources do we need to run a cluster for 50 tenants?

We set up the following experiment:

We created a cluster with a node pool with a node with 2GB 1vCPU.
We started with a single node.
We set up the cluster autoscaler.

Then, we executed the following script:

#!/bin/bash

for i in {1..50}
do
  vcluster create "tenant-$i" --connect=false --upgrade
done

The cluster settled on 17 nodes.

Since each node was priced at $12/month, the total is ~$204/month.

We are also charged $1/month for the 10GB persistent volume.

And that takes the total to ~$254/month or ~$5 per tenant.

What if you need to segregate workloads into different clusters for regulatory reasons?

Another option is to have a dedicated cluster per tenant.

Hard multi-tenancy: dedicated clusters with Karmada

You could use Karmada to manage the tenant cluster and deploy common workloads all once across all clusters.

Karmada’s architecture is similar to that of Vcluster.

First, a cluster manager control plane is aware of multiple clusters.

You usually deploy it in a specially design cluster that doesn’t run any workloads.

Then, Karmada employs an agent that receives instructions from the Karmada control plane and forwards them to the local cluster.

In the end, you have the following arrangement:

The Karmada control plane can schedule workloads across all clusters.
Each cluster can still deploy workloads independently.

Of all options, this is the most expensive to maintain and operate.

How expensive exactly?

We created 51 clusters (50 tenants, 1 cluster manager) with a single node (1vCPU, 2GB).

All clusters were regional (no HA), and Akamai didn’t charge us a penny to run them.

The only cost we incurred was the single worker node for each cluster.

The total: 50$12=~$612/month or ~$12/tenant/month (the cost of the single node).*

Comparing multi-tenancy costs

Three different multi-tenant options have vastly different levels of isolation and costs.

This is just the minimum and doesn’t consider all other tools you should install.

As Artem discussed in this KubeFM episode, you must consider multi-tenancy for all tools in your cluster.

That means monitoring, logging, pipelines, etc.

Are you sharing to save on costs but risk that a failure could affect everyone, or do you invest in clear separation and face the hefty bill?

For example, take the Ingress controller we mentioned at the beginning of the article: you could have a single ingress controller for 50 tenants or 50 dedicated controllers.

The costs are drastically different, but notice how not all multi-tenancy options are created equal.

You don’t have a choice with separate clusters: you must have a dedicated ingress controller per cluster (at least) — so there are fewer options for sharing and cost saving.

Summary

Hosting tenants in the same Kubernetes cluster requires balancing costs and isolation.

You can opt for soft multi-tenancy and light overhead, but there is a risk that a single issue could affect all tenants.

Or you can opt to be fully isolated but install, manage and upgrade clusters for several teams.

The last option is a trade-off: a shared cluster with a dedicated control plane as an in-between solution that trades some isolation for more manageable costs.