Navigating the Trade-Offs of Scaling Kubernetes Dev Environments

Jason English

February 10, 2023

Minute Read

This is some text inside of a div block.

Kubernetes Platform Series

Part 1: Kubernetes Self-service

Part 2: Kubernetes Multi-tenancy

Part 3: Kubernetes Cost Optimization with Virtual Clusters

Welcome to cloud native paradise, where development teams can grab exactly the pre-configured Kubernetes clusters they need from a self-service engineering platform, and they are ready to scale across hybrid cloud infrastructures in a multi-tenant way wherever they are deployed..

Self-service provisioning and multi-tenancy eliminates useless toil and resource constraints from software delivery. Dev teams waste no valuable time getting the clusters they need, when they need them, so it is also cheaper.

Therefore, we can plan to impress the CFO with a better ratio of total value delivered to the business, versus total cost of ownership for the software. Problem solved!

But like everything else in software development, we must expect the unexpected. Developers may still find themselves mucking with infrastructure instead of coding, and surprises inevitably await when it comes time to pay the cloud bill for dev, test, staging, and production environments.

How should we balance cost and performance goals when scaling up cloud native environments?

Cloud costs: hard to measure, hard to evaluate

By my last count, there are dozens of vendors claiming to reduce public cloud compute and storage fees through various forms of limiting consumption through the account interface.

That’s useful for a one-time cost improvement, but it fails to take into account the complexity and hidden costs of meeting the requirements of a cloud native development team with environments sophisticated enough to meet their needs at scale. This leaves platform teams with several open questions:

How do you optimize the cost of entry for assembling that ‘golden path’ cloud native architecture?

Is redundant labor required to configure infrastructure as code scripts and permissions for self-service environments if they don’t come out fully baked?

Beyond cloud fees, what are the support costs of maintaining so many dev and staging labs atop an ever-changing Kubernetes stack?

Obviously, a simple cost equation based on fees and licenses can only tell us half of the story about how to value the capacity for improving software delivery productivity, reducing toil, and preventing talent churn.

How do organizations arrive at the real value of cost and performance metrics?

Taking a FinOps view of challenges

FinOps has emerged as a discipline to address the see-saw problem of balancing the CFO’s budget constraints with the CIO’s technology delivery requirements, by governing the technology spending decisions of an enterprise against the value of business outcomes generated through technology investments.

The costly environmental pollution of old servers, containers and VM sprawl hounds mature organizations, where proof-of-concept experimental deployment and test environments are often forgotten or abandoned at the end of every project, leading to a lengthy housecleaning.

The introduction of cloud computing and SaaS vendors allowed companies to replace big capital (or capex) outlays for datacenters, hardware and enterprise silos with pay-as-you-go operational expenses (or opex) for resources that could elastically scale in capacity and cost.

This honeymoon didn’t last forever, as cloud environment costs started ballooning by year-over-year multiples in many cases. Companies started realizing they needed to get even more FinOps oversight into opex than they once spent on capex purchases.

Developers naturally want cloud native environments on demand that are scaled to their exact needs. To avoid waiting for clusters to spin up, they build and provision multiple clusters to support each use case in AWS, and then leave them running 24/7, each with its own EC2 control and worker nodes.

What a waste of electricity and cloud fees – 10x the identical infrastructure is left running 10x more than it needs to be.

It’s not like AWS or Azure or GCS want to sell their customers cloud capacity they aren’t going to productively use. But at the same time, a hyperscaler would also never suggest turning off any tenant’s reserved instances or clusters that developers might want to use down the road.

Rightsizing and right timing

A core principle of FinOps is rightsizing: paying for and provisioning just the right amount of capacity or resources to get the job done, and nothing more.

Loft Labs offers an interesting approach to rightsizing cloud native development environments with a multitenant Kubernetes platform that shares a control and management plane. This shared platform stack spins up ready ‘golden state’ configurations–with underlying microservices like logging, monitoring and networking in seconds–and spins them down the instant they are no longer in use.

The core technology driving the platform is its open source vCluster technology, which allows multiple virtual clusters to run as momentary workloads within a single Kubernetes namespace, while retaining developer work isolation and access controls on a per-vCluster basis.

Early cost saving estimates of this approach are promising. Loft ran a scenario analysis of an enterprise with 300 single tenant Kubernetes clusters running on Amazon EKS, with an annual operation cost of $1,642,800. By using 300 virtual clusters on one shared Kubernetes cluster, that company would instead spend around $997,876 for the year – for nearly a 40 percent savings in cost. Developers would see no difference in their experience.

Figure 1. Estimated cost analysis of EKS clusters alone versus virtual clusters atop a single shared multi-tenant EKS cluster. Source: Loft Labs

Additionally, a sleep mode allows vClusters to automatically suspend operations and take ‘naps’ during non-peak usage times, or whenever they are idle, and then refresh in seconds. This takes care of resource usage during irregular project schedules and is estimated to save an additional 30% in cloud costs without impacting developer availability.

The Intellyx Take

Of course, development platform teams could just create unique Kubernetes namespaces for each devtest environment, then each could chart their own clusters at will – and that’s fine if the configuration and cloud costs aren’t a concern for the organization. After all, it’s all free, open source tooling, right?

One of the coolest features of the cloud native development paradigm is that it purposely ‘leaves the wires hanging’ rather than dictating one way to serve complex distributed applications and organizations.

Kubernetes leaves the door open for highly compact virtual clusters that can share costly cloud resources while still serving a highly distributed multi-tenant development workforce with high-performance development environments that also save unnecessary labor costs and perform well in budget reviews.

Originally published here.

©2023 Intellyx LLC. Intellyx is editorially responsible for this document. No AI bots were used to generate any part of this content. At the time of writing, Loft Labs is an Intellyx customer. Image source: Marc, flickr CC2.0 license.