Kubernetes StatefulSet - Examples & Best Practices

Bharathiraja Shanmugam

November 23, 2021

13 Minute Read

This is some text inside of a div block.

Containers were initially designed to be stateless, so building stateful applications in them can be tricky. Luckily, Kubernetes Statefulset solves this problem.

It manages a set of replica pods, all with a unique identity. This makes it the perfect tool for managing applications that need persistent storage or a unique network identity.

This article covers what Kubernetes Statefulsets are, when to use them, and how to deploy stateful applications with them. It also covers Statefulset concepts using the MySQL database. This includes creating a fixed Pod name for each MySQL replication and accessing the replicated Pods using the Services object.

Key Takeaway

Kubernetes Statefulset is a tool for managing, deploying, and scaling stateful applications.
Unlike deployments, StatefulSet guarantees the order of pod creation and termination. This makes it better for building stateful applications.
However, they fall short in certain areas, such as volume deletion and storage management.

What Are Stateful Applications?

Stateful applications receive data and store and track them. MySQL, Oracle, and PostgreSQL are some popular stateful applications.

Stateless applications are the opposite. They do not store data; they only process it when they receive it. Node.js and Nginx are examples of stateless applications.

A modern web application uses both types of applications to serve user requests. For example, Node.js, a stateless application, receives new data on each user request. It then connects to a stateful application like MySQL to process the data. MySQL then stores and updates the data based on the user's request.

What Are StatefulSets?

Kubernetes StatefulSets are controllers that run stateful applications as containers (Pods). They assign a sticky identity—an ordinal number starting from zero—to each Pod. They do this instead of assigning random IDs to each replica Pod.

The system creates a new Pod by cloning the previous Pod’s data. If the previous Pod remains in the pending state, then the system will not create the new Pod.

If you delete a Pod, the system will delete it in reverse order, not random order. For example, if you had four replicas and scaled down to three, it would delete the Pod numbered 3.

The diagram below shows how the Pod is numbered from zero and how Kubernetes persistent volume is attached to the Pod in the StatefulSets.

StatefulSets architecture with persistent volume

When to Use StatefulSets

There are several reasons to consider using StatefulSets. Here are two examples:

Say you deployed a MySQL database in the Kubernetes cluster and scaled it to three replicas, and a frontend application wants to access the MySQL cluster to read and write data. The read request will be forwarded to three Pods. However, the write request will only be forwarded to the first (primary) Pod, and the data will be synced with the other Pods. You can achieve this by using StatefulSets.
Deleting or scaling down a StatefulSet will not delete the volumes in the stateful application. This gives your data safety. If you delete the MySQL Pod or if the MySQL Pod restarts, you can have access to the data in the same volume.

Deployment vs. StatefulSets

You can also create Pods (containers) using the Deployment object in the Kubernetes cluster. This allows you to easily replicate Pods and attach a storage volume to the Pods. The same thing can be done by using StatefulSets. What then is the advantage of using StatefulSets?

Well, the Pods created using the Deployment object are assigned random IDs. For example, you are creating a Pod named “my-app”, and you are scaling it to three replicas. The names of the Pods are created like this:

my-app-123ab my-app-098bd my-app-890yt

After the name “my-app”, random IDs are added. If the Pod restarts or you scale it down, then again, the Kubernetes Deployment object will assign different random IDs for each Pod. After restarting, the names of all Pods appear like this:

my-app-jk879 my-app-kl097 my-app-76hf7

All these Pods are associated with one load balancer service. So in a stateless application, changes in the Pod name are easily identified, and the service object easily handles the random IDs of Pods and distributes the load. This type of deployment is very suitable for stateless applications.

Stateless application scaled for three replicas

However, stateful applications cannot be deployed like this. The stateful application needs a sticky identity for each Pod because replica Pods are not identical Pods.

Take a look at the MySQL database deployment. Assume you are creating Pods for the MySQL database using the Kubernetes Deployment object and scaling the Pods. If you are writing data on one MySQL Pod, do not replicate the same data on another MySQL Pod if the Pod is restarted. This is the first problem with the Kubernetes Deployment object for the stateful application.

Stateful application Pod created with Deployment object

Stateful applications always need a sticky identity. While the Kubernetes Deployment object offers random IDs for each Pod, the Kubernetes StatefulSets controller offers an ordinal number for each Pod starting from zero, such as mysql-0, mysql-1, mysql-2, and so forth.

For stateful applications with a StatefulSet controller, it is possible to set the first Pod as primary and other Pods as replicas—the first Pod will handle both read and write requests from the user, and other Pods always sync with the first Pod for data replication. If the Pod dies, a new Pod is created with the same name.

The diagram below shows a MySQL primary and replica architecture with persistent volume and data replication architecture.

Now, add another Pod to that. The fourth Pod will only be created if the third Pod is up and running, and it will clone the data from the previous Pod.

In summary, StatefulSets provide the following advantages when compared to Deployment objects:

Ordered numbers for each Pod
The first Pod can be a primary, which makes it a good choice when creating a replicated database setup, which handles both reading and writing
Other Pods act as replicas
New Pods will only be created if the previous Pod is in running state and will clone the previous Pod’s data
Deletion of Pods occurs in reverse order

How to Create a StatefulSet in Kubernetes

Let's go through how to create a Pod for MySQL database using StatefulSet.

Create a Secret

First, create a Secret for the MySQL app. This will store sensitive information, such as usernames and passwords.

Here, I am creating a simple Secret. However, in a production environment, using the HashiCorp Vault is recommended. Use the following code to create a Secret for MySQL:

apiVersion: v1 kind: Secret metadata: name: mysql-password type: opaque stringData: MYSQL_ROOT_PASSWORD: password

Save the code using the file name mysql-secret.yaml. Then, run this command on your Kubernetes cluster:

kubectl apply -f mysql-secret.yaml

Get the list of Secrets:

kubectl get secrets

Create a MySQL StatefulSet Application

Before creating a StatefulSet application, check your volumes by getting the persistent volume list:

kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM STATUS pvc-e0567 10Gi RWO Retain Bound

Next, get the persistent volume claim list:

kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS mysql-store-mysql-set-0 Bound pvc-e0567d43ffc6405b 10Gi RWO

Last, get the storage class list:

kubectl get storageclass NAME PROVISIONER RECLAIMPOLICY linode-block-storage linodebs.csi.linode.com Delete linode-block-storage-retain (default) linodebs.csi.linode.com Retain

Then use the following code to create a MySQL StatefulSet application in the Kubernetes cluster:

apiVersion: apps/v1 kind: StatefulSet metadata: name: mysql-set spec: selector: matchLabels: app: mysql serviceName: "mysql" replicas: 3 template: metadata: labels: app: mysql spec: terminationGracePeriodSeconds: 10 containers: - name: mysql image: mysql:5.7 ports: - containerPort: 3306 volumeMounts: - name: mysql-store mountPath: /var/lib/mysql env: - name: MYSQL_ROOT_PASSWORD valueFrom: secretKeyRef: name: mysql-password key: MYSQL_ROOT_PASSWORD volumeClaimTemplates: - metadata: name: mysql-store spec: accessModes: ["ReadWriteOnce"] storageClassName: "linode-block-storage-retain" resources: requests: storage: 5Gi

Here are a few things to note:

The kind is a StatefulSet. kind tells Kubernetes to create a MySQL application with the stateful feature.
The password is taken from the Secret object using the secretKeyRef.
The Linode block storage was used in the volumeClaimTemplates. If you are not mentioning any storage class name here, then it will take the default storage class in your cluster.
The replication count here is 3 (using the replica parameter), so it will create three Pods named mysql-set-0, mysql-set-1, and mysql-set-2.

Next, save the code using the file name mysql.yaml and execute using the following command:

kubectl apply -f mysql.yaml

Now that the MySQL Pods are created, get the Pods list:

kubectl get pods NAME READY STATUS RESTARTS AGE mysql-set-0 1/1 Running 0 142s mysql-set-1 1/1 Running 0 132s mysql-set-2 1/1 Running 0 120s

Create a Service for the StatefulSet Application

Now, create the service for the MySQL Pod. Do not use the load balancer for a stateful app. Instead, create a headless service for the MySQL app using this code:

apiVersion: v1 kind: Service metadata: name: mysql labels: app: mysql spec: ports: - port: 3306 clusterIP: None selector: app: mysql

Save the code using the file name mysql-service.yaml and execute using the following command:

kubectl apply -f mysql-service.yaml

Get the list of running services:

kubectl get svc

Create a Client for MySQL

To access MySQL, you will need a MySQL client tool. Deploy a MySQL client using the following manifest code:

apiVersion: v1 kind: Pod metadata: name: mysql-client spec: containers: - name: mysql-container image: alpine command: ['sh','-c', "sleep 1800m"] imagePullPolicy: IfNotPresent

Save the code using the file name mysql-client.yaml and execute using the following command:

kubectl apply -f mysql-client.yaml

Then enter this into the MySQL client:

kubectl exec --stdin --tty mysql-client -- sh

Finally, install the MySQL client tool:

apk add mysql-client

Access the MySQL Application Using the MySQL Client

Next, access the MySQL application using the MySQL client and create databases on the Pods.

If you are not already in the MySQL client Pod, enter it now:

kubectl exec -it mysql-client /bin/sh

To access MySQL, you can use the same standard MySQL command to connect with the MySQL server:

mysql -u root -p -h host-server-name

For access, you will need a MySQL server name. The syntax of the MySQL server in the Kubernetes cluster is given below:

stateful_name-ordinal_number.mysql.default.svc.cluster.local #Example mysql-set-0.mysql.default.svc.cluster.local

Connect with the MySQL primary Pod using the following command. When asked for a password, enter the one you made in the “Create a Secret” section above.

mysql -u root -p -h mysql-set-0.mysql.default.svc.cluster.local

Next, create a database on the MySQL primary, then exit:

create database erp; exit;

Now connect the other Pods and create the database like above:

mysql -u root -p -h mysql-set-1.mysql.default.svc.cluster.local mysql -u root -p -h mysql-set-2.mysql.default.svc.cluster.local

Remember that while Kubernetes helps you set up a stateful application, you will need to set up the data cloning and data sync by yourself. This cannot be done by the StatefulSets.

Best Practices

If you plan to deploy stateful apps, like Oracle, MySQL, Elasticsearch, and MongoDB, use StatefulSets. They're a great option.

You need to consider the following points while creating stateful applications:

Create a separate namespace for databases.
Place all the needed components for stateful applications, such as ConfigMaps, Secrets, and Services, in the particular namespace.
Put your custom scripts in the ConfigMaps.
Use headless service instead of load balancer service while creating Service objects.
Use the HashiCorp Vault for storing your Secrets.
Use the persistent volume storage for storing the data. Then your data won’t be deleted even if the Pod dies or crashes.

Deployment objects are the controllers most commonly used to create pods in Kubernetes. You can easily scale these Pods by mentioning the replication count in the manifest file.

For stateless applications, Deployment objects are most suitable. For example, assume you are planning to deploy your Node.js application and want to scale it to five replicas. In this case, the Deployment object is well suited.

The diagram below shows how Deployment and StatefulSets assign names to the Pods.

Pod name convention in Deployment and StatefulSets

StatefulSets create ordinal service endpoints for each Pod created using the replica option. The diagram below shows how the stateful Pod endpoints are created with ordinal numbering and how they communicate with each other.

Final Words

This article covered Kubernetes's two main controllers for creating Pods: Deployments and StatefulSets. A Deployment object is best for stateless apps, while a StatefulSet is best for stateful apps.

The StatefulSets controller assigns each Pod an ordinal number, starting at zero. It helps stateful apps set up a primary replica architecture. If a Pod dies, a new Pod is re-created using the same name. This helpful feature does not break the chain of stateful application clusters. If you are scaling down, it will be deleted in reverse order.

Use the StatefulSets controller in the Kubernetes cluster to deploy stateful applications such as Oracle, MySQL, Elasticsearch, and MongoDB. While cloning and syncing data must still be completed manually, StatefulSets eases the complexity involved in deploying stateful applications.

Photo by Photoholgic on Unsplash

Frequently Asked Questions

What are stateless and stateful Kubernetes?

Kubernetes is a deployment and scaling platform for containerized applications. Kubernetes applications usually have two states: stateful and stateless.

A stateful Kubernetes application receives data, stores it, and tracks it. A stateless Kubernetes application receives data but does not store or track it.

What are the limitations of StatefulSets in Kubernetes?

While Statefulsets is a great tool, it has issues that could affect your experience. Here are a few you should know about

Resource management: Statefulsets have two ways of storing resources: manual storage or a Persistent volume provisioner. If you use another approach, you could run into some issues. You will likely have difficulty connecting your preferred storage to pods using Statefulset.
Headless Service: To use a headless service with StatefulSet, you must set it up yourself. If you don't, it would be difficult to work with network identities.
Volume Deletion: Normally, Statefulset lets you set the order in which pods stop if rescheduled. But that changes if you delete a Statefulset completely. This could lead to potential leftover storage.

When should I use a StatefulSet?

Use Statefulset when deploying an application that needs stable identities for its pods. You don't need to use that if that's not the case.

What is the impact of node failure on StatefulSet in Kubernetes?

If a Kubernetes node fails, the application running on the Statefulset will not function well. If this happens, you'll have to mark the node out-of-service. This will forcefully delete the pods on the node, allowing it to recover on another node quickly.