Kubernetes has many advantages; among them is the ability to easily create and delete workloads as containers. When using stateful applications, care must be taken when handling data. Pods created by Kubernetes have readable and writable disk space inside the Pod, but deleting a Pod also deletes this disk space. For Pods that collect databases and logs, it is inconvenient if the disk is deleted at the same time as the Pod. So data persistence—a mechanism that keeps data even after the Pod is deleted—is required.
Kubernetes uses a highly abstract storage model for retaining data, allowing users to allocate and use volumes for containers in Pods without knowing the storage details. If you need to store data within Kubernetes, chances are you will be using persistent volumes.
PersistentVolume (PV for short) is part of the network storage within a cluster provided by the administrator. Just like the nodes in a cluster, PV is a resource in the cluster. The PV is an API object that captures the implementation details of a system, such as NFS, iSCSI, or other cloud storage systems.
In this article, you will learn more about what persistent volumes are and how best to use them.
#What are Persistent Volumes (PVs)?
PV is the way to define the storage data, such as storage classes or storage implementations. Unlike ordinary volumes, PV is a resource object in a Kubernetes cluster; creating a PV is equivalent to creating a storage resource object. To use this resource, it must be requested through persistent volume claims (PVC). A PVC volume is a request for storage, which is used to mount a PV into a Pod. The cluster administrator can map different classes to different service levels and different backend policies.
Persistent storage volume can be carried out through the YAML configuration file and specify which plugin type to use. The following is a YAML configuration file for persistent storage volume. This configuration file requires 5Gi of storage space to be provided. The storage mode is
Filesystem, the access mode is
ReadWriteOnce, and the persistent storage volume is recycled through the
Recycle recycling policy. Finally, the storage class is specified as
slow, and the NFS plug-in type is
apiVersion: v1 kind: PersistentVolume metadata: name: pv0005 spec: capacity: storage: 5Gi volumeMode: Filesystem accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Recycle storageClassName: slow mountOptions: - hard - nfsvers=4.1 nfs: path: /tmp server: 172.17.0.2
#Features of Persistent Volumes
Each PV contains a specification (spec) and status of the volume. This section describes the spec attributes of a PV configuration file, with reference to the YAML configuration file example given above.
Generally, a PV will specify storage capacity. This is set by using the capacity property of the PV. Currently, the capacity property attribute storage is the only resource that can be set or requested. In the future, it may include attributes such as IOPS, throughput rate, etc..
Kubernetes supports two volume modes of persistent volumes. A valid value for volume mode can be either
Filesystem is the default mode if the volume mode is not defined.
ReadOnlyMany(ROX)allows being mounted by multiple nodes in read-only mode.
ReadWriteOnce(RWO)allows being mounted by a single node in read-write mode.
ReadWriteMany(RWX)allows multiple nodes to be mounted in read-write mode.
A volume can only be mounted using one access mode at a time, even if it supports many access modes.
A PV can specify a StorageClass to dynamically bind the PV and PVC, where the specific StorageClass is specified via the
storageClassName property. If no PV is specified with this property, it can only bind to a PVC that does not require a specific class.
When the node no longer needs persistent storage, the reclaiming strategies that can be used include:
Retain- meaning the PV, until deleted, is kept alive.
Recycle- meaning the data can be restored later after getting scrubbed.
Delete- associated storage assets (such as AWS EBS, GCE PD, Azure Disk, and OpenStack Cinder volumes) are deleted.
Currently, only NFS and hostPath support the
Recycle policy. AWS EBS, GCE PD, Azure Disk, and Cinder volumes support the
Kubernetes administrators can specify mount options for mounting persistent volumes on a node. Not all PV types support mount options.
Common types of mount options supported are:
- RBD (Rados Block Device)
- Cinder (OpenStack volume storage)
#What are Persistent Volume Claims (PVCs)?
PVC is a declaration defining the request for storage data usage, which is mounted into a Pod for use. PVC is configured for use by developers, who do not necessarily care about the specific implementation of the underlying data storage, but more so about the business-related data storage size, access methods, etc.
Here is the configuration file for the
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pv0004 spec: storageClassName: manual accessModes: - ReadWriteOnce resources: requests: storage: 3Gi
#Expanding Persistent Volume Claims
Support for expanding persistent volume claims (PVCs) is enabled by default. The following are volumes that can be expanded:
- Azure File
- Azure Disk
allowVolumeExpansion field must be set to
true to expand a PVC:
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: pv0003 provisioner: kubernetes.io/glusterfs parameters: resturl: “http://192.168.10.100:8080” restuser: “” secretNamespace: “” secretName: “” allowVolumeExpansion: true
#Lifecycle of PV and PVC
In a Kubernetes cluster, a PV exists as a storage resource in the cluster. PVCs are requests for those resources and also act as claim checks to the resource. The interaction between PVs and PVCs follows this lifecycle:
Provisioning- the creation of the PV, either directly (static) or dynamically using
Binding- assigning the PV to the PVC.
Using- Pods use the volume through the PVC.
Reclaiming- the PV is reclaimed, either by keeping it for the next use or by deleting it directly from the cloud storage.
A volume will be in one of the following states:
Available- this state shows that the PV is ready to be used by the PVC.
Bound- this state shows that the PV has been assigned to a PVC.
Released- the claim has been deleted, but the cluster has not yet reclaimed the resource.
Failed- this state shows that an error has occurred in the PV.
There are two ways to provision persistent storage volumes in Kubernetes:
PVs are created by Kubernetes cluster administrators and exist in the Kubernetes API. PVs represent real storage, and these stores provided by PVs are available to all users in the cluster. With static provisioning, the PV is created in advance by the cluster administrator; the developer creates the PVC and the Pod, and the Pod uses the storage provided by the PV through the PVC.
For dynamic provisioning, when none of the static PVs created by the administrator can match the user’s PVC, the cluster will try to automatically provision a storage volume for the PVC, which is based on
StorageClass. In the dynamic provisioning direction, the PVC needs to request a storage class, but this storage class must be pre-created and configured by the administrator. The cluster administrator needs to enable the access controller for
DefaultStorageClass in the API Server.
The user creates a PVC (or has previously created one for dynamic provisioning), specifying the requested storage size and access mode. The master has a control loop to monitor new PVCs, find matching PVs (if any), and bind the PVC and PV together. If a PV is ever dynamically provisioned to a new PVC, the loop will always bind that PV to the PVC. In addition, users will always get at least the storage they request, but the volume may exceed their request. Once bound, PVC bindings are exclusive, no matter what their binding mode is.
If no matching PV is found, the PVC will remain unbound indefinitely, and once the PV is available, then the PVC will become bound again. For example, if a cluster is provisioned with many 50G PVs, it will not match the 100G PVCs requested, and the PVCs will not be bound until 100G PVs are added to the cluster.
The Pod uses PVC as a volume, and the Kubernetes cluster looks up the bound PV by the PVC and mounts it to the Pod. The user can specify the access method when using PVC as a volume. For volumes that support multiple access methods, the user specifies which mode is desired when using their claim as a volume in a Pod. Once a user has a bound PVC, the bound PV belongs to that user. The user can access the possessed PV through the PVC contained in the Pod’s storage volume.
When a user is done with their volume, they can delete the PVC objects from the API that allows reclamation of the resource. The reclaim policy for a
PersistentVolume tells the cluster what to do with the volume after it has been released of its claim. Currently, volumes can either be retained or deleted.
Retain reclaim policy allows for manual reclamation of the resource. When the PVC is deleted, the PV still exists, and the volume is considered “released.” However, it is not yet available for another claim because the previous claimant’s data remains on the volume.
For storage volume plugins that support a
Delete reclaim policy, deletion removes the PV from Kubernetes. Also, it removes the storage asset from the associated external infrastructure, such as AWS EBS, GCE PD, Azure Disk, or Cinder storage volumes.
#Common Use Cases
Now, take a look at a few examples to learn about common use cases.
The following config file describes a single-instance MySQL Deployment. The MySQL container mounts the PV at
MYSQL_ROOT_PASSWORD environment variable sets the database password from the
apiVersion: v1 kind: Service metadata: name: wordpress-mysql labels: app: wordpress spec: ports: - port: 3306 selector: app: wordpress tier: mysql clusterIP: None --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: mysql-pv-claim labels: app: wordpress spec: accessModes: - ReadWriteOnce resources: requests: storage: 20Gi --- apiVersion: apps/v1 kind: Deployment metadata: name: wordpress-mysql labels: app: wordpress spec: selector: matchLabels: app: wordpress tier: mysql strategy: type: Recreate template: metadata: labels: app: wordpress tier: mysql spec: containers: - image: mysql:5.6 name: mysql env: - name: MYSQL_ROOT_PASSWORD valueFrom: secretKeyRef: name: mysql-pass key: password ports: - containerPort: 3306 name: mysql volumeMounts: - name: mysql-persistent-storage mountPath: /var/lib/mysql volumes: - name: mysql-persistent-storage persistentVolumeClaim: claimName: mysql-pv-claim
The following config file describes a single-instance WordPress Deployment. The WordPress container mounts the PV at
/var/www/html for website data files. The
WORDPRESS_DB_HOST environment variable sets the name of the MySQL
Service defined above, and WordPress will access the database by
WORDPRESS_DB_PASSWORD environment variable sets the database password from the
Secret kustomize generated.
apiVersion: v1 kind: Service metadata: name: wordpress labels: app: wordpress spec: ports: - port: 80 selector: app: wordpress tier: frontend type: LoadBalancer --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: wp-pv-claim labels: app: wordpress spec: accessModes: - ReadWriteOnce resources: requests: storage: 20Gi --- apiVersion: apps/v1 kind: Deployment metadata: name: wordpress labels: app: wordpress spec: selector: matchLabels: app: wordpress tier: frontend strategy: type: Recreate template: metadata: labels: app: wordpress tier: frontend spec: containers: - image: wordpress:4.8-apache name: wordpress env: - name: WORDPRESS_DB_HOST value: wordpress-mysql - name: WORDPRESS_DB_PASSWORD valueFrom: secretKeyRef: name: mysql-pass key: password ports: - containerPort: 80 name: wordpress volumeMounts: - name: wordpress-persistent-storage mountPath: /var/www/html volumes: - name: wordpress-persistent-storage persistentVolumeClaim: claimName: wp-pv-claim
The following config file describes PVC requesting a raw block volume.
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: block-pvc spec: accessModes: - ReadWriteOnce volumeMode: Block resources: requests: storage: 10Gi
The following config file describes creating a PVC from a volume snapshot.
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: restore-pvc spec: storageClassName: csi-hostpath-sc dataSource: name: new-snapshot-test kind: VolumeSnapshot apiGroup: snapshot.storage.k8s.io accessModes: - ReadWriteOnce resources: requests: storage: 10Gi
The following config file describes creating a
PersistentVolumeClaim from an existing PVC.
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: cloned-pvc spec: storageClassName: my-csi-plugin dataSource: name: existing-src-pvc-name kind: PersistentVolumeClaim accessModes: - ReadWriteOnce resources: requests: storage: 10Gi
When configuring a PV, Kubernetes documentation recommends the following set of best practices to keep in mind:
To reduce management overhead and enable scaling, avoid statically creating and assigning persistent volumes. Instead, use dynamic provisioning. In your storage class, define the appropriate reclaim policy to minimize storage costs once Pods are deleted.
A maximum number of sizes is supported by each node; therefore, different amounts of local storage and capacity are provided by different node sizes. Plan appropriately for your application demands to deploy the right size of nodes.
The persistent volume (PV) lifecycle is independent of any particular container in the cluster. Persistent volume claims (PVC) are a request made by a container user or application for a specific type of storage. When creating a PV, Kubernetes documentation recommends the following:
- Always include PVCs in the container configuration.
- Never include PVs in container configuration, as this will tightly couple a container to a specific volume.
- Always have a default StorageClass; otherwise, PVCs that don’t specify a specific class will fail.
- Give StorageClasses meaningful names.
Kubernetes persistent storage offers Kubernetes applications a convenient way to request and consume storage resources. PVC and PV are equivalent to object-oriented interfaces and implementations. The Pod created by the user declares the PVC, and Kubernetes will find a PV to pair it with. If there is no PV to pair with, go to the corresponding StorageClass, help it create a PV, and then complete the binding with the PVC. The newly created PV needs to create a remote disk for the host through the master node attached and then mount the attached remote disk to the host directory through the kubelet component of each node.
PVs are ideal if you have data that has to be shared between Pods or that must survive restarts. PVs can be defined and tied to a specific Pod, and, therefore, you can now run data-driven applications on Kubernetes as well.