Kubernetes Persistent Volumes: Examples & Best Practices

Shingai Zivuku

Nov 3, 2021

16 min read

Get Started Now

Kubernetes has many advantages; among them is the ability to easily create and delete workloads as containers. When using stateful applications, care must be taken when handling data. Pods created by Kubernetes have readable and writable disk space inside the Pod, but deleting a Pod also deletes this disk space. For Pods that collect databases and logs, it is inconvenient if the disk is deleted at the same time as the Pod. So data persistence—a mechanism that keeps data even after the Pod is deleted—is required.

Kubernetes uses a highly abstract storage model for retaining data, allowing users to allocate and use volumes for containers in Pods without knowing the storage details. If you need to store data within Kubernetes, chances are you will be using persistent volumes. PersistentVolume (PV for short) is part of the network storage within a cluster provided by the administrator. Just like the nodes in a cluster, PV is a resource in the cluster. The PV is an API object that captures the implementation details of a system, such as NFS, iSCSI, or other cloud storage systems.

In this article, you will learn more about what persistent volumes are and how best to use them.

#What are Persistent Volumes (PVs)?

PV is the way to define the storage data, such as storage classes or storage implementations. Unlike ordinary volumes, PV is a resource object in a Kubernetes cluster; creating a PV is equivalent to creating a storage resource object. To use this resource, it must be requested through persistent volume claims (PVC). A PVC volume is a request for storage, which is used to mount a PV into a Pod. The cluster administrator can map different classes to different service levels and different backend policies.

Persistent storage volume can be carried out through the YAML configuration file and specify which plugin type to use. The following is a YAML configuration file for persistent storage volume. This configuration file requires 5Gi of storage space to be provided. The storage mode is Filesystem, the access mode is ReadWriteOnce, and the persistent storage volume is recycled through the Recycle recycling policy. Finally, the storage class is specified as slow, and the NFS plug-in type is used.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv0005
spec:
  capacity:
   storage: 5Gi
  volumeMode: Filesystem
  accessModes:
   - ReadWriteOnce
  persistentVolumeReclaimPolicy: Recycle
  storageClassName: slow
  mountOptions:
   - hard
   - nfsvers=4.1
  nfs:
   path: /tmp
   server: 172.17.0.2

#Exploring the Purpose and Benefits of Persistent Volumes

Persistent Volumes are designed to tackle the challenges of managing data storage in a Kubernetes cluster. By providing a storage abstraction layer, PVs allow for seamless storage management while ensuring data persistence. This abstraction enables developers and operators to focus on application logic rather than worrying about the underlying storage infrastructure.

One of the key benefits of using Persistent Volumes is their ability to decouple storage from individual pods. This decoupling allows for greater flexibility in managing pods, as the data stored in a Persistent Volume remains intact even if a pod is deleted or rescheduled. Additionally, Persistent Volumes support various access modes, allowing multiple pods to read and write data simultaneously.

When it comes to data storage in a Kubernetes cluster, Persistent Volumes play a crucial role. They act as a bridge between the applications running in pods and the underlying storage infrastructure. By abstracting away the complexities of storage management, PVs simplify the deployment and scaling of applications, making it easier for developers to focus on their core tasks.

Furthermore, Persistent Volumes provide a reliable and consistent storage solution. The data stored in a PV is not tied to a specific pod, meaning that even if a pod fails or is rescheduled, the data remains accessible. This ensures data integrity and high availability, critical for applications that require persistent storage.

#Features of Persistent Volumes

Each PV contains a specification (spec) and status of the volume. This section describes the spec attributes of a PV configuration file, with reference to the YAML configuration file example given above.

#Capacity

Generally, a PV will specify storage capacity. This is set by using the capacity property of the PV. Currently, the capacity property attribute storage is the only resource that can be set or requested. In the future, it may include attributes such as IOPS, throughput rate, etc..

#Volume Modes

Kubernetes supports two volume modes of persistent volumes. A valid value for volume mode can be either Filesystem or Block. Filesystem is the default mode if the volume mode is not defined.

#Access Modes

ReadOnlyMany(ROX) allows being mounted by multiple nodes in read-only mode.
ReadWriteOnce(RWO) allows being mounted by a single node in read-write mode.
ReadWriteMany(RWX) allows multiple nodes to be mounted in read-write mode.

A volume can only be mounted using one access mode at a time, even if it supports many access modes.

#Class

A PV can specify a StorageClass to dynamically bind the PV and PVC, where the specific StorageClass is specified via the storageClassName property. If no PV is specified with this property, it can only bind to a PVC that does not require a specific class.

#Reclaim Policy

When the node no longer needs persistent storage, the reclaiming strategies that can be used include:

Retain - meaning the PV, until deleted, is kept alive.
Recycle - meaning the data can be restored later after getting scrubbed.
Delete - associated storage assets (such as AWS EBS, GCE PD, Azure Disk, and OpenStack Cinder volumes) are deleted.

Currently, only NFS and hostPath support the Recycle policy. AWS EBS, GCE PD, Azure Disk, and Cinder volumes support the Delete policy.

#Mount Options

Kubernetes administrators can specify mount options for mounting persistent volumes on a node. Not all PV types support mount options.

Common types of mount options supported are:

gcePersistentDisk
awsElasticBlockStore
AzureDisk
NFS
RBD (Rados Block Device)
CephFS
Cinder (OpenStack volume storage)
Glusterfs

#What are Persistent Volume Claims (PVCs)?

PVC is a declaration defining the request for storage data usage, which is mounted into a Pod for use. PVC is configured for use by developers, who do not necessarily care about the specific implementation of the underlying data storage, but more so about the business-related data storage size, access methods, etc.

Here is the configuration file for the PersistentVolumeClaim:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pv0004
spec:
  storageClassName: manual
  accessModes:
   - ReadWriteOnce
  resources:
   requests:
    storage: 3Gi

#Expanding Persistent Volume Claims

Support for expanding persistent volume claims (PVCs) is enabled by default. The following are volumes that can be expanded:

gcePersistentDisk
awsElasticBlockStore
Cinder
Glusterfs
RBD
Azure File
Azure Disk
Portworx
FlexVolumes
CSI

Storage class allowVolumeExpansion field must be set to true to expand a PVC:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: pv0003
provisioner: kubernetes.io/glusterfs
parameters:
  resturl: “http://192.168.10.100:8080”
  restuser: “”
  secretNamespace: “”
  secretName: “”
allowVolumeExpansion: true

#Lifecycle of PV and PVC

In a Kubernetes cluster, a PV exists as a storage resource in the cluster. PVCs are requests for those resources and also act as claim checks to the resource. The interaction between PVs and PVCs follows this lifecycle:

Provisioning - the creation of the PV, either directly (static) or dynamically using StorageClass.
Binding - assigning the PV to the PVC.
Using - Pods use the volume through the PVC.
Reclaiming - the PV is reclaimed, either by keeping it for the next use or by deleting it directly from the cloud storage.

A volume will be in one of the following states:

Available - this state shows that the PV is ready to be used by the PVC.
Bound - this state shows that the PV has been assigned to a PVC.
Released - the claim has been deleted, but the cluster has not yet reclaimed the resource.
Failed - this state shows that an error has occurred in the PV.

#Provisioning

There are two ways to provision persistent storage volumes in Kubernetes:

#Static

PVs are created by Kubernetes cluster administrators and exist in the Kubernetes API. PVs represent real storage, and these stores provided by PVs are available to all users in the cluster. With static provisioning, the PV is created in advance by the cluster administrator; the developer creates the PVC and the Pod, and the Pod uses the storage provided by the PV through the PVC.

#Dynamic

For dynamic provisioning, when none of the static PVs created by the administrator can match the user’s PVC, the cluster will try to automatically provision a storage volume for the PVC, which is based on StorageClass. In the dynamic provisioning direction, the PVC needs to request a storage class, but this storage class must be pre-created and configured by the administrator. The cluster administrator needs to enable the access controller for DefaultStorageClass in the API Server.

#Binding

The user creates a PVC (or has previously created one for dynamic provisioning), specifying the requested storage size and access mode. The master has a control loop to monitor new PVCs, find matching PVs (if any), and bind the PVC and PV together. If a PV is ever dynamically provisioned to a new PVC, the loop will always bind that PV to the PVC. In addition, users will always get at least the storage they request, but the volume may exceed their request. Once bound, PVC bindings are exclusive, no matter what their binding mode is.

If no matching PV is found, the PVC will remain unbound indefinitely, and once the PV is available, then the PVC will become bound again. For example, if a cluster is provisioned with many 50G PVs, it will not match the 100G PVCs requested, and the PVCs will not be bound until 100G PVs are added to the cluster.

#Using

The Pod uses PVC as a volume, and the Kubernetes cluster looks up the bound PV by the PVC and mounts it to the Pod. The user can specify the access method when using PVC as a volume. For volumes that support multiple access methods, the user specifies which mode is desired when using their claim as a volume in a Pod. Once a user has a bound PVC, the bound PV belongs to that user. The user can access the possessed PV through the PVC contained in the Pod’s storage volume.

#Reclaiming

When a user is done with their volume, they can delete the PVC objects from the API that allows reclamation of the resource. The reclaim policy for a PersistentVolume tells the cluster what to do with the volume after it has been released of its claim. Currently, volumes can either be retained or deleted.

#Retain

The Retain reclaim policy allows for manual reclamation of the resource. When the PVC is deleted, the PV still exists, and the volume is considered “released.” However, it is not yet available for another claim because the previous claimant’s data remains on the volume.

#Delete

For storage volume plugins that support a Delete reclaim policy, deletion removes the PV from Kubernetes. Also, it removes the storage asset from the associated external infrastructure, such as AWS EBS, GCE PD, Azure Disk, or Cinder storage volumes.

#Introduction to Kubernetes Storage Concepts and Demo

#Common Use Cases

One common use case for Persistent Volumes is database management. Databases often require persistent storage to maintain data integrity and ensure high availability. By using a Persistent Volume, administrators can ensure that the data stored in the database remains persistent even if the corresponding pod goes offline or gets rescheduled.

Another use case for Persistent Volumes is file sharing and collaboration applications. Applications like content management systems or file servers often require shared storage where multiple pods can read and write data simultaneously. Persistent Volumes provide the necessary functionality to meet these requirements.

In addition to databases and file sharing applications, Persistent Volumes can also be beneficial in other scenarios. For example, in machine learning applications, where large datasets need to be stored and accessed by multiple pods simultaneously, PVs offer a scalable and efficient solution. Similarly, in IoT deployments, where sensor data needs to be collected and stored persistently, Persistent Volumes can ensure data reliability and availability.

Overall, Persistent Volumes are a powerful tool in the Kubernetes ecosystem. They provide a flexible and reliable way to manage data storage, decoupling it from individual pods and enabling seamless scaling and management of applications. By understanding the purpose and benefits of Persistent Volumes, developers and operators can make informed decisions about when and how to leverage this technology in their applications.

Now, take a look at a few examples to learn about common use cases.

#Example 1:

The following config file describes a single-instance MySQL Deployment. The MySQL container mounts the PV at /var/lib/mysql. The MYSQL_ROOT_PASSWORD environment variable sets the database password from the Secret.

apiVersion: v1
kind: Service
metadata:
  name: wordpress-mysql
  labels:
    app: wordpress
spec:
  ports:
    - port: 3306
  selector:
    app: wordpress
    tier: mysql
  clusterIP: None
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-pv-claim
  labels:
    app: wordpress
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: wordpress-mysql
  labels:
    app: wordpress
spec:
  selector:
    matchLabels:
      app: wordpress
      tier: mysql
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: wordpress
        tier: mysql
    spec:
      containers:
      - image: mysql:5.6
        name: mysql
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-pass
              key: password
        ports:
        - containerPort: 3306
          name: mysql
        volumeMounts:
        - name: mysql-persistent-storage
          mountPath: /var/lib/mysql
      volumes:
      - name: mysql-persistent-storage
        persistentVolumeClaim:
          claimName: mysql-pv-claim

#Example 2:

The following config file describes a single-instance WordPress Deployment. The WordPress container mounts the PV at /var/www/html for website data files. The WORDPRESS_DB_HOST environment variable sets the name of the MySQL Service defined above, and WordPress will access the database by Service. The WORDPRESS_DB_PASSWORD environment variable sets the database password from the Secret kustomize generated.

apiVersion: v1
kind: Service
metadata:
  name: wordpress
  labels:
    app: wordpress
spec:
  ports:
    - port: 80
  selector:
    app: wordpress
    tier: frontend
  type: LoadBalancer
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: wp-pv-claim
  labels:
    app: wordpress
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: wordpress
  labels:
    app: wordpress
spec:
  selector:
    matchLabels:
      app: wordpress
      tier: frontend
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: wordpress
        tier: frontend
    spec:
      containers:
      - image: wordpress:4.8-apache
        name: wordpress
        env:
        - name: WORDPRESS_DB_HOST
          value: wordpress-mysql
        - name: WORDPRESS_DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-pass
              key: password
        ports:
        - containerPort: 80
          name: wordpress
        volumeMounts:
        - name: wordpress-persistent-storage
          mountPath: /var/www/html
      volumes:
      - name: wordpress-persistent-storage
        persistentVolumeClaim:
          claimName: wp-pv-claim

#Example 3:

The following config file describes PVC requesting a raw block volume.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: block-pvc
spec:
  accessModes:
   - ReadWriteOnce
  volumeMode: Block
  resources:
   requests:
    storage: 10Gi

#Example 4:

The following config file describes creating a PVC from a volume snapshot.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: restore-pvc
spec:
  storageClassName: csi-hostpath-sc
  dataSource:
   name: new-snapshot-test
   kind: VolumeSnapshot
   apiGroup: snapshot.storage.k8s.io
  accessModes:
   - ReadWriteOnce
  resources:
   requests:
    storage: 10Gi

#Example 5:

The following config file describes creating a PersistentVolumeClaim from an existing PVC.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cloned-pvc
spec:
  storageClassName: my-csi-plugin
  dataSource:
   name: existing-src-pvc-name
   kind: PersistentVolumeClaim
  accessModes:
   - ReadWriteOnce
  resources:
   requests:
    storage: 10Gi

#Best Practices

When configuring a PV, Kubernetes documentation recommends the following set of best practices to keep in mind:

To reduce management overhead and enable scaling, avoid statically creating and assigning persistent volumes. Instead, use dynamic provisioning. In your storage class, define the appropriate reclaim policy to minimize storage costs once Pods are deleted.
A maximum number of sizes is supported by each node; therefore, different amounts of local storage and capacity are provided by different node sizes. Plan appropriately for your application demands to deploy the right size of nodes.
The persistent volume (PV) lifecycle is independent of any particular container in the cluster. Persistent volume claims (PVC) are a request made by a container user or application for a specific type of storage. When creating a PV, Kubernetes documentation recommends the following:
- Always include PVCs in the container configuration.
- Never include PVs in container configuration, as this will tightly couple a container to a specific volume.
- Always have a default StorageClass; otherwise, PVCs that don’t specify a specific class will fail.
- Give StorageClasses meaningful names.

#Exploring the Difference Between Kubernetes Volumes and Persistent Volumes

In Kubernetes, volumes are essentially temporary directories that can be mounted by containers within a pod. However, once a pod terminates, the data stored within the volumes is lost. On the contrary, Persistent Volumes provide a means to decouple storage from pods, enabling data persistence even when pods come and go.

Moreover, Persistent Volumes in Kubernetes offer support for various storage technologies, including local storage, network-attached storage (NAS), and cloud storage platforms. This flexibility ensures that developers can meet their specific storage requirements without being bound to a single solution.

When it comes to Kubernetes volumes, they are tightly coupled with the lifecycle of a pod. This means that when a pod is terminated or rescheduled, the data stored within its volumes is lost. This limitation can be problematic for applications that require data persistence, such as databases or file storage systems.

On the other hand, Persistent Volumes provide a solution to this problem by decoupling storage from pods. This means that even if a pod is terminated or rescheduled, the data stored in the Persistent Volume remains intact. This decoupling allows for data persistence and ensures that applications can rely on their data even in the face of pod failures or changes.

Furthermore, Persistent Volumes offer support for various storage technologies, giving developers the flexibility to choose the most suitable option for their specific needs. Whether it’s local storage for high-performance applications, network-attached storage (NAS) for shared access, or cloud storage platforms for scalability, Kubernetes' Persistent Volumes can accommodate a wide range of storage requirements.

Overall, Persistent Volumes empower developers to build scalable and resilient applications in Kubernetes, with the assurance that their data will persist across pod lifecycles.

#Conclusion

Kubernetes persistent storage offers Kubernetes applications a convenient way to request and consume storage resources. PVC and PV are equivalent to object-oriented interfaces and implementations. The Pod created by the user declares the PVC, and Kubernetes will find a PV to pair it with. If there is no PV to pair with, go to the corresponding StorageClass, help it create a PV, and then complete the binding with the PVC. The newly created PV needs to create a remote disk for the host through the master node attached and then mount the attached remote disk to the host directory through the kubelet component of each node.

PVs are ideal if you have data that has to be shared between Pods or that must survive restarts. PVs can be defined and tied to a specific Pod, and, therefore, you can now run data-driven applications on Kubernetes as well.

Photo by Annie Spratt on Unsplash