Virtual Clusters (vClusters)

In Loft, users can create virtual Kubernetes clusters which are fully functional Kubernetes clusters but they run inside a namespace of another cluster.

Why Virtual Clusters?

Virtual clusters:

  • are much cheaper than "real" clusters (shared resources and sleep mode just like for namespaces)
  • can be created and cleaned up again in seconds (great for CI/CD or testing)
  • allow to test different Kubernetes versions inside a single host cluster which may have a different version than the virtual clusters
  • more powerful than simple namespace (virtual clusters allow users to use CRDs etc.)
  • allow users to install apps which require cluster-wide permissions while being limited to actually just one namespace within the host cluster
  • provide strict isolation while still allowing you to share certain services of the underlying host cluster (e.g. using the host's ingress-controller and cert-manager)

Workflows

Create vCluster

Loft UI - Create Virtual Cluster

Use vCluster

Run this command to add a kube-context for the virtual cluster to your local kube-config file or to switch to an existing kube-context of a virtual cluster:

loft use vcluster # shows a list of all available vclusters
loft use vcluster [VCLUSTER_NAME] # optional flags: --cluster=[CLUSTER_NAME] --space [VCLUSTER_NAMESPACE]

Then, run any kubectl command within the virtual cluster:

kubectl get namespaces

Delete vCluster

Run this command to delete a virtual cluster:

loft delete vcluster [VCLUSTER_NAME] # optional flags: --cluster=[CLUSTER_NAME] --space [VCLUSTER_NAMESPACE]


Advanced Configuration

You can configure virtual clusters using the VirtualCluster CRD or using the Loft UI. Under the hood, Virtual clusters are deployed using a Helm chart (see the virtual-cluster project on GitHub). You are able to change how the virtual cluster is deployed via the Advanced Options section in the Loft UI.

Networking

By default, Services and Ingresses are synced from the virtual cluster to the host cluster in order to enable correct network functionality for the virtual cluster.

That means that you can create ingresses in your virtual cluster to make services available via domains using the ingress-controller that is running in the host cluster. This helps to share resources across different virtual clusters and is easier for users of virtual clusters because otherwise, they would need to install an ingress-controller and configure DNS for each virtual cluster.

Because the syncer keeps annotations for ingresses, you may also set the cert-manager ingress annotations to use the cert-manager in your host cluster to automatically provision SSL certificates from Let's Encrypt.

If you do not want ingresses to be synced and instead use a separate ingress controller within a virtual cluster, add the following syncer configuration in the virtual cluster advanced options:

syncer:
extraArgs: ["--disable-sync-resources=ingresses"]

Syncer

By default, certain Kubernetes objects such as Services or Ingresses are synced from the virtual cluster to the host cluster. The syncer removes problematic fields (such as pod labels) and also transforms certain fields of these objects before sending them to the host cluster.

You can configure additional syncer options using the following options:

syncer:
extraArgs:
- --disable-sync-resources=ingresses # any k8s resources not to sync to the host cluster
env: []
volumeMounts:
- mountPath: /data
name: data
resources: {}
# Create a cluster role for node syncing
# rbac:
# clusterRole:
# create: true

The following flags are available:

--bind-address string The address to bind the server to (default "0.0.0.0")
--client-ca-cert string The path to the client ca certificate (default "/data/server/tls/client-ca.crt")
--disable-sync-resources string The resources that shouldn't be synced by the virtual cluster (e.g. --disable-sync-resources=ingresses,nodes)
--enable-storage-classes If enabled, the virtual cluster will sync storage classes (make sure rbac.clusterRole.create is true in the options)
--enforce-node-selector If enabled and --node-selector is set then the virtual cluster will ensure that no pods are scheduled outside of the node selector (default true)
--fake-nodes If enabled, the virtual cluster will create fake nodes instead of copying the actual physical nodes config (default true) (if false make sure rbac.clusterRole.create is true in the options)
--fake-persistent-volumes If enabled, the virtual cluster will create fake persistent volumes instead of copying the actual physical persistent volumes config (default true) (if false make sure rbac.clusterRole.create is true in the options)
--kube-config string The path to the virtual cluster admin kube config (default "/data/server/cred/admin.kubeconfig")
--log-flush-frequency duration Maximum number of seconds between log flushes (default 5s)
--node-selector string If set, nodes with the given node selector will be synced to the virtual cluster. This will implicitly set --fake-nodes=false (make sure rbac.clusterRole.create is true in the options)
--out-kube-config-secret string If specified, the virtual cluster will write the generated kube config to the given secret (default "kubeconfig")
--owning-statefulset string If configured, all synced resources will have this statefulset as owner reference
--port int The port to bind to (default 8443)
--request-header-ca-cert string The path to the request header ca certificate (default "/data/server/tls/request-header-ca.crt")
--server-ca-cert string The path to the server ca certificate (default "/data/server/tls/server-ca.crt")
--server-ca-key string The path to the server ca key (default "/data/server/tls/server-ca.key")
--service-name string The service name where the virtualcluster proxy will be available (default "virtualcluster")
--suffix string The suffix to append to the synced resources in the namespace (default "suffix")
--sync-all-nodes If enabled and --fake-nodes is false, the virtual cluster will sync all nodes instead of only the needed ones
--target-namespace string The namespace to run the virtual cluster in (defaults to current namespace)
--tls-san strings Add additional hostname or IP as a Subject Alternative Name in the TLS cert
--translate-image strings Translates image names from the virtual pod to the physical pod (e.g. --translate-image=coredns/coredns=mirror.io/coredns/coredns)

Control Plane (k3s)

You can configure the control plane (API server, controller-manager, etcd) using the following options:

virtualCluster:
image: rancher/k3s:v1.18.4-k3s1
command:
- /bin/k3s
baseArgs:
- server
- --write-kubeconfig=/k3s-config/kube-config.yaml
- --data-dir=/data
- --disable=traefik,servicelb,metrics-server,local-storage
- --disable-network-policy
- --disable-agent
- --disable-scheduler
- --disable-cloud-controller
- --flannel-backend=none
- --kube-controller-manager-arg=controllers=*,-nodeipam,-nodelifecycle,-persistentvolume-binder,-attachdetach,-persistentvolume-expander,-cloud-node-lifecycle
extraArgs:
- --service-cidr=10.96.0.0/12
volumeMounts:
- mountPath: /data
name: data
env: []
resources: {}

Note that you can use any k3s version and all arguments/flags that the k3s binary of this specific version provides. See k3s for more information.

Monitoring

In general, monitoring inside a virtual cluster works as monitoring a normal kubernetes cluster, except that kubelet cadvisor and probes metrics cannot be scraped directly, since the namespace and pod names in the scraped metrics refer to the host kubernetes cluster names instead of the actual virtual cluster names. However, virtual clusters provide custom kubelet metrics endpoints that collect the metrics from the host kubernetes cluster and automatically filters and rewrites the metrics so that the pod and namespace names fit.

To configure prometheus to scrape the rewritten kubelet metrics, follow these steps:

  1. Make sure the virtual cluster can access nodes data. This can be enabled in the advanced options of clusters with:
# Below you can configure the virtual cluster
...
# This option will deploy a ClusterRole & ClusterRoleBinding
# that allows the virtual cluster to access node metrics
rbac:
clusterRole:
create: true
...
  1. Make sure your prometheus (that is running inside of the virtual cluster) can access the metrics endpoints of the virtual cluster (inside the virtual cluster it is accessed through the default/kubernetes service). This can be done by either editing the already deployed prometheus ClusterRole, or creating a new one:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: virtual-cluster-prometheus-role
subjects:
- kind: ServiceAccount
# SERVICE ACCOUNT NAME OF THE PROMETHEUS HERE
name: prometheus-kube-prometheus-prometheus
# NAMESPACE OF THE PROMETHEUS HERE
namespace: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: virtual-cluster-prometheus-role
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: virtual-cluster-prometheus-role
rules:
- verbs:
- get
nonResourceURLs:
- /metrics
- /metrics/cadvisor
- /metrics/probes
  1. Make sure the endpoints and service default/kubernetes have the correct labels:
# Label service
kubectl label --overwrite services kubernetes --namespace default k8s-app=kubelet
# Label endpoints
kubectl label --overwrite endpoints kubernetes --namespace default k8s-app=kubelet
  1. Deploy the service monitor into the prometheus namespace:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: prometheus-kube-kubelet
namespace: prometheus
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
honorLabels: true
path: /metrics/cadvisor
port: https
relabelings:
- sourceLabels:
- __metrics_path__
targetLabel: metrics_path
scheme: https
tlsConfig:
caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecureSkipVerify: true
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
honorLabels: true
path: /metrics/probes
port: https
relabelings:
- sourceLabels:
- __metrics_path__
targetLabel: metrics_path
scheme: https
tlsConfig:
caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecureSkipVerify: true
jobLabel: k8s-app
namespaceSelector:
matchNames:
- default
selector:
matchLabels:
k8s-app: kubelet

Now prometheus should be able to scrape the correct kubelet metrics for your virtual cluster and all grafana dashboards should work as expected. These steps also work for virtual clusters inside virtual clusters.



How does it work?

The basic idea of a virtual cluster is to spin up a functional but incomplete new Kubernetes cluster within an existing cluster and sync certain needed core resources between the two clusters. The virtual cluster itself only consists of the core Kubernetes components: API server, controller manager and etcd. Besides some core kubernetes resources like pods, services, persistentvolumeclaims etc. that are needed for actual execution, all other kubernetes resources (like unbound configmaps, unbound secrets, deployments, replicasets, resourcequotas, clusterroles, crds, apiservices etc.) are purely handled within the virtual cluster and NOT synced to the host cluster. This makes it possible to allow each user access to a complete own kubernetes cluster with full functionality, while still being able to separate them in namespaces in the actual host cluster.

A virtual Kubernetes cluster in Loft is tied to a single namespace. The virtual cluster and hypervisor run within a single pod that consists of two parts:

  • a k3s instance which contains the Kubernetes control plane (API server, controller manager and etcd) for the virtual cluster
  • an instance of a virtual cluster hypervisor which is mainly responsible for syncing cluster resources between the k3s powered virtual cluster and the underlying host cluster
vCluster Architecture
vCluster Architecture Overview

Host Namespace

All Kubernetes objects of a virtual cluster are encapsulated in a single namespace within the underlying host cluster, which allows system admins to restrict resources of a virtual cluster via account/resource quotas or limit ranges. Because the virtual cluster uses a cluster id suffix for each synced resource, it is also possible to run several virtual clusters within a single namespace, without them interfering with each other. Furthermore it is also possible to run virtual clusters within virtual clusters.

Control Plane

The virtual cluster technology of Loft is based on k3s to reduce virtual cluster overhead. k3s is a certified Kubernetes distribution and allows Loft to spin up a fully functional Kubernetes cluster using a single binary while disabling all unnecessary Kubernetes components like the pod scheduler which is not needed for the virtual cluster because pods are actually scheduled in the underlying host cluster.

Hypervisor

Besides k3s, there is a Kubernetes hypervisor that emulates a fully working Kubernetes setup in the virtual cluster. Besides the synchronization of virtual and host cluster resources, the hypervisor also redirects certain Kubernetes API requests to the host cluster, such as port forwarding or pod / service proxying. It essentially acts as a reverse proxy for the virtual cluster.