Building an Internal Kubernetes Platform

thumbnail for this post

The container orchestration technology Kubernetes has become the dominant solution for cloud infrastructure and as such it is maturing at an unrivaled pace. Many companies have already adopted Kubernetes or are in the process of it. However simple Kubernetes adoption is not enough, but you need a broader diffusion of Kubernetes in your organization.

Some early adopters show what next steps are needed for this deeper adoption of Kubernetes: An internal Kubernetes platform.

Spotify, Datadog (page 25), and Box are just some publicly available examples of companies that have built such platform for their engineering teams.

What is an internal Kubernetes platform

An internal Kubernetes platform is a platform that allows engineers to get a direct access to a Kubernetes environment on-demand for company-internal use.

From an engineer’s perspective, the platform must provide self-service namespaces or another Kubernetes environment on-demand. It also needs to be easy to use even for engineers that are not experienced in working with Kubernetes. And finally, an internal Kubernetes platform must provide the engineers a real Kubernetes access, which means that it is possible for them to interact with Kubernetes and use Kubernetes-native components and resources. For this, it is not sufficient to just use a Platform-as-a-Service system that runs on top of Kubernetes or provide a Kubernetes test environment behind a CI/CD pipeline.

From an administrator perspective, an internal Kubernetes platform needs to be centrally controllable, so that the admins can oversee the whole system because with a company-wide roll-out the usage of it will be much higher than with some first adoption experiments. It is thus also necessary that the platform allows to manage and limit the users efficiently, so that the total cost of running the internal Kubernetes platform does not get out of hand.

A good internal Kubernetes platform combines all of these features in a single solution that supports all stakeholders in their further Kubernetes adoption.

Why should you have an internal Kubernetes platform

With the increasing amount of applications running on top of Kubernetes, it is also necessary that the adoption of Kubernetes spreads within organizations, so that developers can directly interact with the technology underlying their applications. Only if this requirement is fulfilled, the Kubernetes advantages of faster development cycles and enhanced application stability can be realized.

As can be seen in the Stack Overflow Developer Survey 2020, developers are ready for this next step as Kubernetes is a highly “wanted” and “loved” technology, i.e. developers want to work with it and the ones already doing so like it.

The first step to enable engineers to work with Kubernetes is to provide them access to it with individual Kubernetes work environments. And this is exactly what an internal Kubernetes platform does, which is why it is so fundamental in a successful further adoption.

What kind of Kubernetes environments exist to build an internal platform

Theoretically, there are several options to provide engineers a direct Kubernetes access: Local Kubernetes clusters, individual clusters, or shared clusters.

1. Local Clusters: Local clusters are special Kubernetes versions made for running on the engineers’ computers, such as minikube or kind. As such, they are necessarily limited to the locally available computing resources and do not have all Kubernetes features that exist in the “real” Kubernetes running in cloud environments. The local runtime environment further makes it impossible to streamline the setup process, so that it needs to be done by the engineers themselves, which requires some k8s expertise.

For this, local Kubernetes clusters are well suited and very popular for the initial adoption and experimentation phase but they are not the right solution for further adoption steps or to build an internal platform.

2. Individual Clusters: Individual clusters are clusters that are only used by one engineer. It is possible to build an internal platform that provides your engineers full individual clusters. In principle, EKS, AKS, and GKE are already “external” Kubernetes platforms and if you gave every engineer access to these public cloud provider solutions, you would have something like an “internal” Kubernetes platform. However, such a solution would be very expensive as computing resources are used extremely inefficiently (the cluster management fees without computing resources would already be $70 per month per engineer). It is also much harder for admins to oversee a system with a huge number of clusters and to find out what is still used and what could be deleted. Finally, most companies would not want to give every engineer direct access to their cloud provider account.

So, building an internal Kubernetes platform with individual clusters for engineers is theoretically possible but it would be very expensive and inefficient, which is why it will hardly be ever done in reality.

3. Shared Clusters: Shared clusters are multi-tenant clusters that are used and shared with several engineers or teams. Shared clusters are real cloud-based Kubernetes clusters and so have basically endless computing resources and can be configured in the same way as the production system. At the same time, only one cluster is needed for a whole engineering team, which makes it easy to manage and control for admins and keeps the resource utilization high.

Shared clusters are the preferred (and only feasible) way to build an internal Kubernetes platform.

To get a more detailed comparison of the different types of Kubernetes environments, take a look at the comparison of local and remote clusters and the comparison of individual and shared clusters.

Namespaces vs. vClusters

If you use a shared cluster environment for your internal platform, engineers will often work with namespaces, which are used as basis to establish multi-tenancy in Kubernetes. For many use cases, the separation of users via namespaces will be enough but there also are virtual Kubernetes clusters (vClusters) that provide a harder form of multi-tenancy and so more stability for your platform. vClusters have the additional advantage that they give the engineers the feeling of working with an individual cluster, so that the engineers can freely configure everything as they need it, for example, they can even independently choose which Kubernetes version they want to use.

Manual vs. Self-Service

Now, the question remains how to provide engineers access to the namespaces or vClusters on a shared cluster. One simple solution for this would be that the cluster admin manually creates and distributes the namespaces and vClusters for the engineers. However, this would create a highly problematic bottleneck that would be a huge productivity killer for engineers, as can be seen in a VMware survey which found that “waiting for central IT to provide access to infrastructure” is the number 1 impediment for developer productivity.

For this, you will need a self-service platform that is easily useable, so developers can start working productively with it from the start and do not have to learn much about the Kubernetes details first.

How can you build an internal Kubernetes platform

1. Choose the Kubernetes platform

As previously described, you want to build your internal platform with shared clusters. Still, you need to decide on which Kubernetes platform your internal platform will run.

Here, you have to choose between a public cloud and a private cloud environment. The most popular public cloud environments are EKS (Amazon Web Service), AKS (Microsoft Azure), and GKE (Google Cloud).

Alternatively, it is also possible to use a multi-cloud or hybrid-cloud approach, which combines several cloud providers or even public and private clouds. Special tools such as Rancher and OpenShift can be very useful to run this type of system.

A good starting point for your internal Kubernetes platform is to use just a single environment that reflects the environment of your production system best. For example, if you use EKS for your Kubernetes applications in production, it makes sense to also start with EKS for your internal platform for development, testing, CI/CD,… If you already have a multi-cloud system in production, you need to evaluate if it makes sense to recreate this setup or if a single-cloud system might be enough.

2. Define the platform architecture

As a next step, you need to determine how your platform should work and what the architecture should look like.

One question in this area is how the platform users’ authentication will work as they will need accounts to access it. There are several options to care for this part of the user management. The simplest is to let admins create the user accounts centrally. This is a good option if you have a relatively small team and not so much fluctuation. In larger teams, it makes sense to let users sign-on themselves, e.g. by allowing users with a specific email address to create new accounts, or by implementing a Single-Sign-on (SSO) system that uses the user management of another service you already have in place, such as GitHub. For the implementation of such a system, you might take a look at dex which is an OpenID Connect provider that connects to many different services, such as LDAP, GitHub, GitLab, Google, Microsoft, and many others.

Another question that you need to answer is how the users of the internal platform will interact with it. Here, you can decide between a graphical UI, which might be easiest to work with for beginners but is also hardest to build, a CLI, that can potentially be well integrated in engineers’ workflows, or Kubernetes Custom Resource Definitions (CRDs), which would be the implementation that is closest to fundamental Kubernetes. Of course, it is also possible to combine the different options, e.g. by providing a GUI that creates CRDs in the background.

3. Set up your platform

The third step in building a Kubernetes platform is the actual setup. Here, you face a make-or-buy decision.

The main benefit to build your own platform from scratch is that such a platform is perfectly customizable as you can determine every part of it. At the same time, this means that the setup requires a lot of work and time. You should also expect that you constantly have to improve your platform as your users’ needs and Kubernetes itself evolve. For this, choosing to build an own platform is mostly suited for organizations with very special needs, e.g. as they are working in a regulated industry, or for companies that are so large that some customization benefits pay off compared to the huge effort. Spotify, as an example, decided to build an own platform just for internal use. If you decide to go this way, you may still build upon existing solutions such as previously mentioned dex or kiosk, which is a multi-tenancy extension for Kubernetes.

For most companies, however, just buying an existing solution may be the better option because the setup is much easier and faster, you do not have to dedicate permanent development resources to improve the platform and because you will automatically get some best practices. At the same time, such a solution has only limited options for customization, which is why it may not be feasible to use for very specialized companies. One advanced off-the-shelf internal Kubernetes platform solution is loft. loft works with any Kubernetes cluster, it provides a GUI, a CLI and is fully based on Kubernetes CRDs. It also takes care of the user management, allows SSO and has some additional features such as a sleep mode to save cost by shutting down unused namespaces and vClusters.

4. Onboard your engineers and provide dev tooling

After you have set up your internal Kubernetes platform, you need to prepare the roll-out in your organization. Here, you should keep in mind that for many engineers who have not worked with Kubernetes before not only the platform but also the workflows will be new. For this, you should prepare an extensive documentation for both the platform and the new development/testing/deployment workflows. To make the transition as easy as possible for the engineers, it makes sense to pre-configure and to implement smart defaults as much as possible.

Having the right developer tooling for the engineers is very helpful at this stage, especially as most of these tools can be pre-configured for your typical use cases. Examples for tools specifically made for developers are DevSpace, which also has an optional loft integration if this is your chosen platform, Skaffold, and Tilt.

As a roll-out strategy, it is often a good idea to start with a single team and progress gradually. This allows you to educate the teams individually and to get insights about their main challenges. If you additionally measure the actual adoption and usage, you can further uncover unexpected challenges and resistances. With this information, you can then improve your system and your documentation step-by-step, so that the adoption process in the teams becomes easier over time.

5. Control Cost

With increasing diffusion of Kubernetes in your organization, cost will become a more important factor because now, every engineer is permanently using a cloud-based Kubernetes environment. Since you are using a shared environment, the usage of the cloud resources should be generally relatively efficient. Still there are some areas of improvement that can reduce the cloud cost significantly.

To find these areas and to generally get a better understanding of your cost structure, e.g. which team causes which cost, you should monitor the cost. For this, tools such as Kubecost or Replex can be very helpful.

You should also start with cost control measures very early on: The most important measure is to limit engineers in their computing resource usage. This will prevent excessive cost from mistakes by engineers. You should also activate horizontal auto-scaling for your cluster as this will adapt your computing power to the current needs.

It is also a requirement for the implementation of a “sleep-mode”, an automatic system that shuts down unused namespaces or vClusters and so prevents idle resources. The savings from such a sleep-mode in combination with horizontal auto-scaling can be enormous (around 70%) if you imagine that engineers are not needing the computing resources at night, on week-ends, or during meetings but you still pay for them if they keep running.

Conclusion

If you have successfully adopted Kubernetes and you now want to take the next step, providing further engineers in your organization with a Kubernetes access is inevitable. The best solution for this is an internal Kubernetes platform that allows engineers to create Kubernetes work environments on-demand. Such a platform allows to standardize the use of Kubernetes within your organization, so even non-Kubernetes experts can work with it, while admins keep central control and oversight.

No matter if you decide to build such a platform from scratch or if you simply buy an existing platform solution, the investment is worth it because only with an internal Kubernetes platform and access for every engineer you will get the full Kubernetes benefits, e.g. in form of faster development cycles, improved stability, and ultimately better software.


Photo by Clint Adair on Unsplash