Kubernetes Namespaces Don't Exist

Daniele Polencic

March 4, 2024

7 Minute Read

This is some text inside of a div block.

Get Started Now

Namespaces are one of the fundamental resources in Kubernetes.

But they don’t provide network isolation, are ignored by the scheduler and can’t limit resource usage.

Also, they are not real and don’t exist in the infrastructure.

The previous statement is at odds with the following command, though:

$ kubectl create namespace test

namespace/test created

Kubernetes created the namespace, and you can verify that:

$ kubectl get ns test -o yaml

apiVersion: v1
kind: Namespace
metadata:
  name: test

You can also create resources in this namespace:

$ kubectl run my-nginx --image=nginx -n test
pod/my-nginx created

If you were to picture a namespace, it could look like this.

But what does this grouping look like inside the cluster?

Namespaces and networking

A Kubernetes network should always satisfy two requirements:

Any pod can reach any other pod (i.e. flat network).
Pods have “stable” IP addresses.

Notice how the first already invalidates the idea of namespaces as network boundaries.

So, how can you prevent the traffic from escaping the current namespace?

You might be already familiar with Network Policies: programmable firewall rules that let you restrict how the traffic can flow in the cluster.

Network policies are enforced by the Container Network Plugin (CNI), but they don’t change how pods are attached to the network (or that any pod can talk to any pod).

Instead, a Network Policy writes additional firewall rules on every node using a DaemonSet.

The DaemonSet is responsible for downloading the NetworkPolicy, understanding the constraints (e.g. no traffic can escape the current namespace) and translating those requirements into firewall rules (usually implemented as iptables or eBPF).

In other words, the network is still unaware of namespaces.

The network plugin DaemonSets understands the namespace constraints and generates networking rules that assume the existence of a namespace.

Namespaces and scheduling workloads

The Kubernetes scheduler assigns pods to nodes through a sequence of steps:

It looks at memory and CPU requests.
It filters all applicable nodes (and discard nodes that can’t host the pod).
It scores the remaining nodes based on available space, the pods' spread, etc.

The scheduler is pluggable, and you can influence its decision with Node Affinity, Taints and Tolerations and more.

But it isn’t aware of namespaces.

This has some obvious but unintuitive consequences: all pods from a single namespace could end up in the same node.

Or, all nodes from a single namespace could spread across all nodes.

In other words, placement is not guaranteed.

And since the scheduler doesn’t mind about namespaces, you have as many pods as you want, and they can use as many resources as they wish — the scheduler doesn’t care.

To work around this issue, Kubernetes has two resources to restrict what you can deploy in a namespace: ResourceQuotas and LimitRanges.

A ResourceQuota defines the amount of resources available in the current namespace.

Let’s have a look at an example:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: quota-with-limits
spec:
  hard:
    pods: "10"
    requests.cpu: "4"
    requests.memory: 4Gi
    limits.cpu: "8"
    limits.memory: 8Gi

In this case, ten pods can be deployed in the current namespace.

Is the scheduler enforcing those constraints?

Unfortunately, not.

The scheduler isn’t aware of the namespace, so the check is implemented in the Kubernetes API server.

The API server is a complex component that processes requests through several steps.

First, it authenticates requests.
Then, it ensures you are granted permission to access the resources.
It then passes through the admission controllers.
And finally, it stores the resource in etcd.

The Admission controllers are where requests can be mutated or validated before storing requests.

There are two admission controllers, Validating and Mutating, each with several default plugins enabled.

You can notice that the Validating Admission Controller has a ResourceQuota plugin.

This plugin is used to check that the resources in the current namespace meet the requirement of the ResourceQuota.

ResourceQuota plugin in the Validating Admission Controller

Excellent, we found where the quota is enforced.

But there’s a gotcha.

The controller can check the ResourceQuota constraints only if the pod has requests and limits.

You should define some defaults to ensure those are always set.

Those are specified with a LimitRange object like this:

apiVersion: v1
kind: LimitRange
metadata:
  name: CPU-resource-constraint
spec:
  limits:
  - default:
      cpu: 500m
    defaultRequest:
      cpu: 500m
    max:
      cpu: "1"
    min:
      cpu: 100m
    type: Container

What happens when you submit a pod without limits to the cluster?

This time, the Mutatining Admission Controller has a LimitRanger plugin that checks the current resource and amends the definition with the default requests and limits.

The LimitRange can also define max and min for requests and limits.

And those are enforced by the Validating Admission Controller.

The LimitRanger is a unique plugin that is present in two places: in the Mutating and Validation controllers.

Now that we’ve discussed how the scheduler is (not) namespace-aware let’s look at permissions.

RBAC and namespaces

In Kubernetes, you can grant access to resources using three objects:

You have users or Service Accounts as identities.
You define the rules of what they are allowed to do (those are called Roles or ClusterRoles).
You link users or Service Accounts to Roles with bindings (i.e. you use RoleBindings for Roles and ClusterRoleBindings for ClusterRoles).

It’s important to remember that:

ClusterRoles and ClusterRoleBindings grant access to resources regardless of the namespace.
Roles and RoleBindings are instead scoped to a single namespace.

It’s nice to see that namespaces are considered this time — unlike when we looked at networking and scheduling.

But if you dig deeper, things start to make more sense.

Every time the Kubernetes API involves namespaces, they are suddenly relevant again.

If you look at the rest of the cluster (e.g., networking and scheduling), namespaces are nowhere to be found.

Namespaces don’t exist in the infrastructure — they are the only resources stored in the database to group (you could say namespace) resources.

And this has some additional consequences.

Multi-tenancy and namespaces

Namespaces are not designed for multi-tenancy, and it shows when you focus on shared Kubernetes components such as CoreDNS.

Nothing stops one tenant from overloading the DNS and affecting all other tenants.

CoreDNS is aware of namespaces because it lets you reach Services using the <service-name>.<namespace-name> domain.

However, requests to CoreDNS are not, so a single pod from a namespace could send hundreds of requests like this:

$ dig my-service.default.svc.cluster.local

;; QUESTION SECTION:
;my-service.default.svc.cluster.local. IN    A

;; ANSWER SECTION:
my-service.default.svc.cluster.local. 30 IN A    10.100.0.1

You cannot do much unless you augment CoreDNS with extra plugins.

You might experience the same with the Kubernetes API: a tenant overwhelming the API with requests.

In this case, things are a bit more complex as the Kubernetes API has a mechanism to prevent request overloading.

But the possibility is so real that in Kubernetes 1.29, there is a new API Priority and Fairness feature designed to mitigate this scenario.

Namespaces and control planes

Hopefully, this article helped you realize that Kubernetes isn’t the perfect multi-tenant platform, and some abstractions usually used to partition resources are less bulletproof than one might think.

But there are workarounds.

In this article, you noticed that most constraints around multi-tenancy are related to common components such as the Kubernetes and CoreDNS.

One solution to work around it is to provide a control plane per tenant.

Does that mean you need a cluster per tenant?

Not really.

vCluster is a popular tool that encapsulates a control plane into a pod within an existing cluster.

You interact with your control plane and are isolated from others.

All workloads are still deployed in the parent cluster, so you still benefit from sharing computing resources.

A similar option is Hypershift — a tool that lets you deploy control planes as pods.

Hypershift is slightly different as you are supposed to provision and manage nodes eventually attached to the control plane, but the outcome is the same: isolated control planes.