Extending Kubernetes with Custom Resource Definitions (CRDs)

James Walker

March 31, 2023

Minute Read

This is some text inside of a div block.

Kubernetes Custom Resource Definitions (CRDs)

What are Kubernetes CRDs?

Tutorial: Extending Kubernetes with Custom Resource Definitions (CRDs)

Challenges with Kubernetes CRDs in a Multi-tenant Environment

Kubernetes custom resource definitions (CRDs) let you add new object types to the Kubernetes API. Kubernetes comes with many different objects that represent the most common application components, such as pods, jobs, ConfigMaps, and secrets. But what if you want to express application-specific data, such as a DatabaseConnection or AuthToken, while preserving its structure and supporting custom behavior? This is where CRDs come in.

CRDs extend the API with support for arbitrary data types. Each CRD you create gets its own API endpoints that you can use to query, create, and edit instances of that resource. Custom resources are fully supported within kubectl, so you can run commands like kubectl get backgroundjobs to interact with your application's objects.

In this article, you'll learn why CRDs are useful and how they relate to controller and operator extensions. Controllers are used to implement custom control loop mechanisms, such as crontabs and job queues, while operators are Kubernetes-specific middleware for individual apps like databases and observability stacks. Both depend heavily on CRDs.

After covering the theory, you'll also see how to register your own CRD and create object instances with kubectl.

Understanding Kubernetes Custom Resources

A custom resource is data stored in Kubernetes that doesn't match an object kind included in the default distribution. You may have already used custom resources provided by popular community projects. For example, cert-manager automates SSL certificate management using Certificate and Issuer resources. Certificates represent real SSL certificates; you can obtain one by creating a CertificateRequest, another CRD provided by cert-manager.

You can use custom resources to encapsulate data required by your own applications, too. They store and retrieve structured data via dedicated API endpoints. Compared to generic solutions such as ConfigMaps, custom resources offer clearer intent, better separation of responsibilities, and an improved management experience when you're creating many instances of a particular data structure.

They're also the foundation for extending Kubernetes with your own controllers and operators.

Custom resources aren't the right choice for every scenario, though. For example, you don't need to create custom resources for arbitrary config values used by your app. In this situation, a plain ConfigMap will be easier to work with. Custom resources should be reserved for unique functionality that's scoped to the namespace or cluster level. They're ideal for data that fits the Kubernetes declarative operation model, requires its own API, and will be managed with ecosystem tools such as kubectl and the Kubernetes dashboard.

CRDs, Controllers, and Operators

Custom resources are usually encountered alongside controllers and operators. A Kubernetes controller monitors specific resource types and carries out actions that achieve desired state changes. The pod controller ensures containers are started in response to new pod manifests being added to your cluster, while cert-manager's controller obtains an SSL certificate when you create a CertificateRequest object.

CRDs are rarely used without an accompanying controller. On their own, CRD instances are simple blobs of data in your cluster. The presence of custom objects used in this way is a good sign that a ConfigMap would be more appropriate for the situation.

Processing CRDs with Controllers

Kubernetes controllers are loops that take actions in response to specific events occurring. The controller's cycle has three main phases:

Observe: The controller determines the cluster's desired state by monitoring for Kubernetes events that describe changes.

Analyze: The observed state is compared to the known existing state. This uncovers discrepancies such as new objects that aren't in the old state or fields that have had their values updated.

Act: The controller performs all the actions necessary to transition the cluster into the desired state.

Creating controllers for your CRDs lets you process their data and carry out tasks inside your cluster. Take the BackgroundJob CRD mentioned in the introduction: you could write a controller that automatically runs a command in a container whenever a new BackgroundJob object is created.

You'd write a simple YAML manifest similar to this:

apiVersion: crds.example.com/v1
kind: BackgroundJob
metadata:
  name: demo-job
spec:
  image: busybox:latest
  command: "echo hello-world"

Applying it to your cluster triggers the following cycle in the controller:

Observe: The controller watches for Kubernetes events relating to BackgroundJob objects.

Analyze: The demo-job object doesn't appear in the cluster's current state. The controller establishes that it needs to run a new job to achieve the desired state.

Act: The controller starts a new pod running the busybox:latest image and executes the specified command. The cluster's actual state now matches the desired state you've declared.

Controllers extend Kubernetes with new behavior but retain the same monitor-act cycle used by its own resources. Objects including deployments, jobs, DaemonSets, and ReplicaSets are managed by controllers that work in this way, watching for events and then applying changes that create the new state.

CRDs and controllers let you implement your own higher-level resources that modify your cluster's state and implement particular behaviors. It's this characteristic that defines when CRDs should be used—if your data is only consumed within your application and isn't supposed to cause a change in your cluster's state, it can exist as plain config data in a ConfigMap instead.

Controllers and the Operator Pattern

Operators are application-specific Kubernetes extensions. They provide controllers and CRDs that automate tasks in your cluster, such as deploying apps and performing maintenance activities like backups and migrations. The documentation describes operators as extensions that seek to "capture the key aim of a human operator who is managing a service or set of services."

Take the example of a database server. This scenario can be difficult to configure in Kubernetes because you need to set up persistent volumes to store your data, StatefulSets to reliably replicate the database instance, and services to handle networking. These implementation details require Kubernetes-specific knowledge that takes you away from the "key aim" of deploying a functioning database.

Operators neatly address the problem by extending your cluster with custom behaviors that link controllers and CRDs. A database operator could provide a DatabaseConnection CRD that lets you supply familiar configuration parameters such as the database engine, schema, and user credentials. Adding a new DatabaseConnection object to your cluster would prompt the operator's controllers to create the persistent volumes, StatefulSets, and services required for your database deployment.

Diagram showing the effects of operators in Kubernetes

Operators distill Kubernetes-specific behavior back to application requirements. The DatabaseConnection operator and CRD permit you to deploy a database while knowing only its engine, schema, and user, without having to understand any Kubernetes concepts. They differ from plain controllers by possessing domain-specific knowledge that automates key tasks.

Implementing a Custom Resource Definition

Adding your own custom resources is easier than you might think. CRDs are created as CustomResourceDefinition objects in a YAML manifest, just like other Kubernetes objects. A CRD's spec declares the name it'll be exposed as in an API and the properties that the CRD instances will possess.

To follow along with this tutorial, you'll need kubectl installed with a functioning connection to a Kubernetes cluster.

To implement the DatabaseConnection resource discussed above, copy the following YAML and save it to a new file called dbcon.yaml in your working directory:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databaseconnections.crds.example.com
spec:
  group: crds.example.com
  scope: Namespaced
  names:
    plural: databaseconnections
    singular: databaseconnection
    kind: DatabaseConnection
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                engine:
                  type: string
                defaultSchema:
                  type: string
                rootUser:
                  type: string
                rootPassword:
                  type: string

You can also find the manifest in this article's GitHub repository.

The manifest defines a new resource type inside the crds.example.com API group. There are a few details to note before you continue:

The spec.scope field declares that DatabaseConnection objects will be scoped to namespaces. To create cluster-level resources, set this field to Cluster instead of Namespaced.

The resource's API names are set within spec.names. This affects the resource's API endpoints and kubectl commands, as well as the value of the kind field when you create new object instances. The example lets you run kubectl get databaseconnection and kubectl get databaseconnections, as well as other similar commands, while using kind: DatabaseConnection in the YAML manifests of your object instances.

All Kubernetes object APIs are versioned, so you can introduce changes without breaking existing objects. A single version is defined for this CRD within its versions field. The served field controls whether the version is currently exposed to clients, while storage: true identifies the single version that is currently used for object storage.

The properties of DatabaseConnection objects are defined in OpenAPI v3 format within the schema field. This manifest states that DatabaseConnection objects will have engine, defaultSchema, rootUser, and rootPassword fields.

$ kubectl apply -f dbcon.yaml
customresourcedefinition.apiextensions.k8s.io/databaseconnections.crds.example.com created

Provisioning the new API endpoints for the resource can take a few minutes to complete. You can check progress by running kubectl describe on your new CRD and inspecting the end of the output:

$ kubectl describe crd databaseconnections.crds.example.com
...
Status:
  Accepted Names:
    Kind:       DatabaseConnection
    List Kind:  DatabaseConnectionList
    Plural:     databaseconnections
    Singular:   databaseconnection
  Conditions:
    Last Transition Time:  2022-11-14T16:02:17Z
    Message:               no conflicts found
    Reason:                NoConflicts
    Status:                True
    Type:                  NamesAccepted
    Last Transition Time:  2022-11-14T16:02:18Z
    Message:               the initial names have been accepted
    Reason:                InitialNamesAccepted
    Status:                True
    Type:                  Established
  Stored Versions:
    v1
Events: <none>

Seeing Type: Established under the Conditions list means your CRD is ready to use. You can check that it's applied correctly by using kubectl to list matching object instances:

$ kubectl get databaseconnections
No resources found in default namespace.

There are no objects yet, but the resource type has been recognized. Trying to use an unregistered type results in an error:

$ kubectl get databaseconnections2
error: the server doesn't have a resource type "databaseconnections2"

Creating Objects Using Your CRD

You're now ready to create some objects using the resource type provided by your CRD. Copy the following YAML to demo-db.yaml in your working directory:

apiVersion: crds.example.com/v1
kind: DatabaseConnection
metadata:
  name: demo-database
spec:
  engine: postgres
  defaultSchema: demo-database
  rootUser: root
  rootPassword: pass

Within this code, you can specify the value of each field.

The apiVersion is set to crds.example.com/v1 because the DatabaseConnection CRD was defined within the crds.example.com API group. v1 indicates that the object's spec uses schema version v1, which was created earlier. Within the spec field, you should set the properties included in the CRD's schema.

Use kubectl to add the object to your cluster:

$ kubectl apply -f demo-db.yaml
databaseconnection.crds.example.com/demo-database created

Repeat the kubectl get command to confirm that the object has been created:

$ kubectl get databaseconnections
NAME            AGE
demo-database   5m28s

Next, use the kubectl describe command to view the demo-database object's details:

$ kubectl describe databaseconnection demo-database
Name:         demo-database
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  crds.example.com/v1
Kind:         DatabaseConnection
Metadata:
  Creation Timestamp:  2022-11-14T16:26:32Z
  Generation:          1
  ...
Spec:
  Default Schema:  demo-database
  Engine:          postgres
  Root Password:   pass
  Root User:       root
Events:            <none>

The properties set in the spec are visible on the created object.

You've now successfully used a CRD to store your own structured data in your Kubernetes cluster. The API is managing DatabaseConnection objects with the specialist schema you've defined.

These objects don't have any effect on your cluster's state on their own, however. In a real scenario, you'd need to package your DatabaseConnection CRD as part of an operator that also includes controllers to observe your objects and modify the state.

Applying a new DatabaseConnection object should launch a database deployment for you. This happens because the operator's controllers watch for the apply event and will respond by creating resources in your cluster. The added resources allow the cluster to attain the new ideal state expressed by the DatabaseConnection object.

A storage controller could provision persistent volumes, for example, while a separate replication controller initializes a StatefulSet to run a primary database node and multiple read-only replicas. Collectively, the controllers have application-specific knowledge that automates the database deployment task for you. This means they're adhering to the operator pattern.

The CRD acts as the frontend to this automated system. You need only create a DatabaseConnection object to launch a fresh database server. If you weren't using CRDs, controllers, and the operator pattern, you'd have to manually assemble all the Kubernetes components, such as StatefulSets, volumes, services, and ConfigMaps, to bring up your containers each time.

Implementing this functionality is out of scope for this tutorial, but you can find detailed information on writing controllers and operators within the Kubernetes documentation.

CRD Schema Validation

CRDs support comprehensive schema validation controls to check whether your objects are valid. The DatabaseConnection example above enforces setting the engine, defaultSchema, and user account properties as strings, for example, but much more complicated rules are also supported using OpenAPI v3 validations.

Here's a more complex version of DatabaseConnection that adds a new replicaCount field accepting values between 1 and 10. It also marks all fields except replicaCount as required and constrains engine to only support mysql and postgres as its values. Save the manifest to dbcon-validated.yaml in your working directory:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databaseconnections.crds.example.com
spec:
  group: crds.example.com
  scope: Namespaced
  names:
    plural: databaseconnections
    singular: databaseconnection
    kind: DatabaseConnection
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                engine:
                  type: string
                  enum:
                    - mysql
                    - postgres
                replicaCount:
                  type: integer
                  minimum: 1
                  maximum: 10
                defaultSchema:
                  type: string
                rootUser:
                  type: string
                rootPassword:
                  type: string
              required:
                - engine
                - defaultSchema
                - rootUser
                - rootPassword

Apply the updated CRD to your cluster:

$ kubectl apply -f dbcon-validated.yaml

Next, save the following invalid DatabaseConnection to invalid-db.yaml:

apiVersion: crds.example.com/v1
kind: DatabaseConnection
metadata:
  name: demo-database
spec:
  engine: redis
  defaultSchema: demo-database
  rootUser: root
  rootPassword: pass

You'll see an error if you try to apply this manifest to your cluster:

$ kubectl apply -f invalid-db.yaml
The DatabaseConnection "demo-database" is invalid: spec.engine: Unsupported value: "redis": supported values: "mysql", "postgres"

The engine field is set to redis, which is unsupported by the CRD's schema. The validation constraints have prevented incorrect data from being added to your cluster.

Conclusion

Custom resource definitions (CRDs) are a mechanism for registering your own object types with the Kubernetes API. They'll appear as standalone endpoints in the API and in tools like kubectl. Controllers and operators use CRDs to extend Kubernetes with new behavior. A controller will observe your objects, analyze the changes compared to the cluster's current state, and apply actions that transition the cluster into the new desired state. Operators combine controllers and CRDs with domain-specific knowledge to automate key tasks inside your cluster.

Although CRDs, controllers, and operators facilitate powerful Kubernetes customizations, they have some limitations that make them unsuitable for certain situations. CRDs can be challenging to manage in multitenant environments, for example, because they apply to the entire cluster, not just individual namespaces. This compromises tenant isolation.

Loft mitigates this problem by providing self-service virtual clusters that operate fully independently of each other. CRDs deployed into one virtual cluster won't affect any others. Teams can work more efficiently using CRDs without causing knock-on effects on their neighbors. Loft's solution also supports multicloud, multicluster, SSO integration, and precise role-based access control, so you can create a productive Kubernetes platform while maintaining guardrails to prevent misuse.

Kubernetes Insights

Development