Kubernetes AI Pipelines with vCluster and Kubeflow Tutorial

Damaso Sanoja

October 23, 2023

Minute Read

This is some text inside of a div block.

AI pipelines streamline the process of building and deploying machine learning (ML) workflows, enhancing scalability and portability. Kubernetes plays a pivotal role in managing these pipelines, allowing for effortless deployment, scaling, and operations of application containers across clusters of hosts. Kubeflow, built on Kubernetes, simplifies the deployment of ML workflows even further. As a cloud-native platform inspired by Google's internal ML pipelines, Kubeflow provides an array of components, including a central dashboard and notebooks, to manage and scale machine learning models effectively and make AI pipelines more accessible and efficient.

You may have already noticed a pattern here. When it comes to deploying ML workflows, scalability and efficiency are paramount. This is where vCluster comes into play.

In this tutorial, you'll discover how vCluster can take the benefits of building AI pipelines in Kubernetes to the next level by leveraging a single Kubernetes cluster to quickly launch virtual environments for testing, development, or production.

Benefits of vCluster for AI Pipelines

You can think of vCluster as a powerful open source tool for creating virtual Kubernetes clusters. These virtual clusters are also fully functional Kubernetes clusters that are isolated from the host cluster. The benefits of this approach include:

Better cost control: You can launch or destroy virtual clusters in seconds. This allows your team to easily create ML development and testing environments without scaling the entire physical cluster, thus allowing enhanced resource utilization and incurring lower overall costs.

Security: Clusters allow better resource isolation and access control so that only authorized users have access to them.

Being platform-agnostic: Regardless of whether your organization needs to set up clusters locally or remotely, and regardless if it's a multicluster or hybrid deployment, your team will be able to deploy virtual clusters with the same ease.

In other words, vCluster allows you to optimize the resources of your Kubernetes cluster and create isolated and secure environments where you can build AI pipelines. On top of all this, setting up a virtual cluster is very straightforward.

Setting Up vCluster for Kubernetes-Based AI Pipelines

This section will cover the three-step process for using vCluster for ML pipelines: installing the vcluster CLI, configuring vCluster for Kubernetes-based AI pipelines, and deploying virtual clusters within the Kubernetes cluster.

Installing the vCluster CLI

Before installing the vCluster CLI, ensure you have:

kubectl and Helm command line tools installed on your local workstation

A kubeconfig file with the appropriate credentials to access the Kubernetes cluster where you want to deploy virtual clusters

Once you have those, you can download and install the latest version of vCluster using the appropriate script for your architecture. You can also download the vCluster binary manually from the GitHub repository if you want to build the binary from the source or install beta versions.

For example, if you use a Mac, you can install vCluster using Homebrew:

brew install vcluster

Configuring vCluster for Kubernetes-Based AI Pipelines

Now that you have the vcluster CLI, you can start to configure vCluster according to your project needs.

vCluster uses Helm charts to deploy virtual clusters, meaning you can edit values.yaml to customize its configuration. Start by adding the vcluster repository to your machine:

helm repo add loft-sh https://charts.loft.sh
helm repo update

Then, search for available vCluster charts:

$ helm search repo loft-sh | grep vcluster
NAME                                    CHART VERSION   APP VERSION     DESCRIPTION                                       
...            
loft-sh/vcluster                        0.15.7          0.15.7          vcluster - Virtual Kubernetes Clusters            
loft-sh/vcluster-eks                    0.15.7          0.15.7          vcluster - Virtual Kubernetes Clusters (eks)      
loft-sh/vcluster-k0s                    0.15.7          0.15.7          vcluster - Virtual Kubernetes Clusters (k0s)      
loft-sh/vcluster-k8s                    0.15.7          0.15.7          vcluster - Virtual Kubernetes Clusters (k8s)      
...

Let's say you choose the default chart, loft-sh/vcluster, which uses K3s. In that case, you can download and extract the chart by running this command:

helm pull loft-sh/vcluster && tar -xvf vcluster-0.15.7.tgz

You can now navigate to the vcluster directory and edit values.yaml. While each project's requirements are different, there is some common ground. For example, depending on the components you use in your ML pipeline, you may need to edit values.yaml to enable services like CoreDNS, ServiceLB, or Metrics Server, which are disabled by default:

# Virtual cluster (K3s) configuration
vcluster:
  # Image to use for the virtual cluster
  image: rancher/k3s:v1.27.3-k3s1
  command:
    - /bin/k3s
  baseArgs:
    - server
    - --write-kubeconfig=/data/k3s-config/kube-config.yaml
    - --data-dir=/data
    - --disable=traefik,servicelb,metrics-server,local-storage,coredns
    - --disable-network-policy
    - --disable-agent
    - --disable-cloud-controller
    - --flannel-backend=none

Regarding the latter, for proper monitoring and logging within your virtual cluster, you'll also have to change the synchronization mode of the nodes.

Besides monitoring, another crucial aspect in production environments is high availability (HA). In this sense, vCluster offers high-availability support for K3s and vanilla Kubernetes. Implementing high availability with vCluster also involves enabling external data storage for K3s, which is covered in detail in the documentation.

Here is an example of HA configuration for K3s:

# Enable HA mode
enableHA: true

# Scale up K3s replicas
replicas: 2

# Set external data store endpoint
vcluster:
  env:
    - name: K3S_DATASTORE_ENDPOINT
      value: mysql://username:password@tcp(hostname:3306)/database-name

# Disable persistent storage as all data (including bootstrap data) is stored in external data store
storage:
  persistence: false

# Scale up CoreDNS replicas
coredns:
  replicas: 2

As you can see, the example uses MySQL as the external data store endpoint for the cluster. You may also notice that K3s and CoreDNS replicas are scaled up. Speaking of scaling virtual clusters, the number of replicas is not the only aspect to consider. When it comes to running ML pipelines, you must ensure that the virtual cluster has sufficient resources. The following are the default values that vCluster uses for K3s-based virtual clusters:

# Virtual cluster (K3s) configuration
vcluster:
...
  env: []
  resources:
    limits:
      memory: 2Gi
    requests:
      cpu: 200m
      memory: 256Mi
...
# Storage settings for the virtual cluster
storage:
  # If this is disabled, vCluster will use an emptyDir instead
  # of a PersistentVolumeClaim
  persistence: true
  # Size of the persistent volume claim
  size: 5Gi
...

By editing these values, you can increase memory, CPU, and storage as needed. When assigning resources to your virtual cluster, keep in mind that vCluster has a low overhead on the host by design, so you don't have to worry about wasting resources. Moreover, you can quickly launch, update, and destroy virtual clusters, allowing you to experiment and adjust parameters more freely.

Aside from resource allocation, another aspect to consider is maintaining consistency across pipelines when configuring vCluster.

In that sense, following GitOps best practices is recommended. A possible strategy would be to use Terraform's Helm provider to deploy virtual clusters using Helm. Another strategy would be to implement vCluster's Cluster API provider to create virtual clusters programmatically. In both cases, all changes could be tracked using Git, allowing your team to roll back any settings quickly. Beyond helping maintain consistency across pipelines, these tools allow your team to automate the deployment of ML pipelines.

Overall, vCluster provides great flexibility to customize virtual clusters. As you'll see below, this flexibility will allow you to deploy virtual clusters with different configurations and thus optimize resources.

Deploying a Virtual Cluster Using the vCluster CLI

Deploying a virtual cluster with the vcluster CLI is as simple as running:

vcluster create my-vcluster

However, that would deploy K3s using the default values from values.yaml. Suppose you need to create a virtual cluster that will only run the experimental phase of your ML pipeline. You could use the following command:

vcluster create testing-pipeline-01 -f testing-config-01.yaml

This command refers to the configuration file testing-config-01.yaml (also known as values.yaml) that Helm will use to set up the virtual cluster. Furthermore, nothing prevents you from deploying other Kubernetes distributions using the --distro flag:

vcluster create prod-pipeline-02 -f prod-config-02.yaml --distro k8s

Simply put, deploying a virtual cluster is a trivial task once you properly configure it by following the discussed recommendations.

Building AI Model Pipelines with Kubeflow and vCluster

vCluster's convenience and versatility are applicable to countless use cases. However, you can make use of your newfound knowledge for a specific scenario: building an AI model pipeline with vCluster.

Kubeflow's modular architecture allows data scientists, ML engineers, and operations teams to build and deploy portable and scalable ML workflows using Kubeflow Pipelines (KFP), along with the tools that they consider necessary, such as Jupyter, PyTorch, TensorFlow, and Katib, among others. In other words, Kubeflow leaves it up to each organization which components to use based on their particular use case.

vCluster fits perfectly into this context, as it can be configured to create tailored development, test, and production environments for machine learning systems.

Suppose your MLOps team requires a full Kubeflow installation with all components included in the Kubeflow Manifests repo.

To this end, you could deploy a virtual cluster with the required resources:

vcluster create kubeflow -f kubeflow.yaml

Then, connect to the newly created kubeflow virtual cluster and install Kubeflow with a single command:

while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done

Keep in mind that you will need Kustomize 5+ installed on your local machine to run the command. Alternatively, using kustomize, you could install only the individual components you need.

Now, if you only need to create a basic AI model pipeline with Kubeflow, you could install a lightweight standalone version of Kubeflow Pipelines instead of a full-fledged Kubeflow deployment.

The procedure would be similar. First, deploy a customized virtual cluster:

vcluster create kfp -f kfp.yaml

Then, connect to the virtual cluster and run the following command:

export PIPELINE_VERSION=2.0.1
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=$PIPELINE_VERSION"
kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/dev?ref=$PIPELINE_VERSION"

In both cases, you can configure the ingress resource and load balancer service to expose the virtual cluster via values.yaml.

To recap, the way to build AI pipelines within a virtual cluster is the same as for a non-virtualized Kubernetes cluster, which is a big plus. No steep learning curve, no extra work. The only thing engineers have to do is configure the virtual cluster according to their needs, which they can easily do with Helm.

Conclusion

All in all, vCluster's role in building next-generation AI pipelines is promising. Its incredible versatility allows MLOps teams to use a comprehensive Kubernetes-native platform like Kubeflow as well as any other tooling they require to develop cutting-edge ML models. Moreover, vCluster's ability to replicate the behavior of "real" Kubernetes clusters is significant as AI development moves towards more complex, distributed systems that demand robust pipeline versioning, enhanced collaboration features, and the integration of more sophisticated AI models. Overall, vCluster makes it easy to create development and testing environments that can be provisioned instantly without scaling the entire physical cluster, thus saving time and money.

That said, the most exciting aspect is that vCluster is also future-proof. As the ability to build and manage AI pipelines across different cloud environments becomes increasingly vital, your organization can count on vCluster. Regardless of whether you adopt a multicloud or hybrid cloud strategy, your organization will be able to deploy virtual clusters without breaking a sweat.

Kubernetes Insights

vCluster