Kubernetes Probes: Startup, Liveness, Readiness

Levent Ogut

Kubernetes has been disruptive due to the scalability, velocity, portability, and observability it adds to cloud deployments. While it brings a whole ecosystem of great features and options and eases complex deployment, it also has its own challenges. One of the great features Kubernetes has brought us is that of high availability. There are many high availability options in Kubernetes; in this article, we will discuss high availability options used for the application/microservice itself.

Pods - the smallest deployable units in Kubernetes - are scheduled once the declarative configuration is applied. Kube-scheduler is responsible for the calculation and schedule; once the schedule is accepted, it is in a controlled and calculated environment, and it is deemed service ready or not by the pod conditions. By using startup, readiness, and liveness probes, we can control when a pod should be deemed started, ready for service, or live. We will explore these conditions and triggers.

Kubernetes Probes Diagram

#Pod and Container Status

Pods have phases and conditions; containers have states. These status properties can and will be changed based on probe results, so let’s explore them.

#Pod Phases

Pod status object includes a phase field. This phase-field tells Kubernetes and us that wherein the execution cycle a pod is.

  • Pending: Accepted by the cluster, containers are not set up yet.
  • Running: At least one container is in a running, starting, or restarting state.
  • Succeeded: All of the containers exited with a status code of zero; the pod will not be restarted.
  • Failed: All containers have terminated and at least one container exited with a status code of non-zero.
  • Unknown: The state of the pod can not be determined.

#Pod Conditions

As well as pod phases, there are pod conditions. These also give information about the state the pod is in.

  • PodScheduled: A Node has been successfully selected to schedule the pod, and scheduling is completed.
  • ContainersReady: All the containers are ready.
  • Initialized: Init containers are started.
  • Ready: The pod is able to serve requests; hence it needs to be included in the service and load balancers.

We can view the pod conditions via kubectl describe pods <POD_NAME> command.

kubectl describe pods <POD_NAME>

Sample output is as follows:

...
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
...

#Container States

The container has three simple states.

  • Waiting: Required processes are running for a successful startup.
  • Running: The container is executing.
  • Terminated: Container started execution and finished by either success or failure.

#Exploring Status on Pod Object

We can see the pod conditions and container states from a Pod object by issuing Kubernetes get pods -o yaml command.

...
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2021-02-08T11:11:53Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2021-02-08T11:14:20Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2021-02-08T11:14:20Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2021-02-08T11:11:52Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://7fc67a850ba439f64ecb51a129a2d7dcbc4a3402b253daa3a6827787f7c80e40
    image: docker.io/library/nginx:latest
    imageID: docker.io/library/nginx@sha256:10b8cc432d56da8b61b070f4c7d2543a9ed17c2b23010b43af434fd40e2ca4aa
    lastState:
      terminated:
        containerID: containerd://c4416e69b7348a7e7be3f7046dc9745dfb38ba537e5b8c06da5020c67b12b3d8
        exitCode: 137
        finishedAt: "2021-02-08T11:14:52Z"
        reason: Error
        startedAt: "2021-02-08T11:14:05Z"
    name: nginx
    ready: true
    restartCount: 1
    started: true
    state:
      running:
        startedAt: "2021-02-08T11:16:28Z"
  hostIP: x.x.x.x
  phase: Running
  podIP: 10.1.239.205
  podIPs:
  - ip: 10.1.239.205
  qosClass: BestEffort
  startTime: "2021-02-08T11:11:53Z"

If you prefer JSON, you can use kubectl get pods <POD_NAME> -o jsonpath='{.status}' | jq

{
  "conditions": [
    {
      "lastProbeTime": null,
      "lastTransitionTime": "2021-02-08T11:11:53Z",
      "status": "True",
      "type": "Initialized"
    },
    {
      "lastProbeTime": null,
      "lastTransitionTime": "2021-02-08T11:14:20Z",
      "status": "True",
      "type": "Ready"
    },
    {
      "lastProbeTime": null,
      "lastTransitionTime": "2021-02-08T11:14:20Z",
      "status": "True",
      "type": "ContainersReady"
    },
    {
      "lastProbeTime": null,
      "lastTransitionTime": "2021-02-08T11:11:52Z",
      "status": "True",
      "type": "PodScheduled"
    }
  ],
  "containerStatuses": [
    {
      "containerID": "containerd://7fc67a850ba439f64ecb51a129a2d7dcbc4a3402b253daa3a6827787f7c80e40",
      "image": "docker.io/library/nginx:latest",
      "imageID": "docker.io/library/nginx@sha256:10b8cc432d56da8b61b070f4c7d2543a9ed17c2b23010b43af434fd40e2ca4aa",
      "lastState": {
        "terminated": {
          "containerID": "containerd://c4416e69b7348a7e7be3f7046dc9745dfb38ba537e5b8c06da5020c67b12b3d8",
          "exitCode": 137,
          "finishedAt": "2021-02-08T11:14:52Z",
          "reason": "Error",
          "startedAt": "2021-02-08T11:14:05Z"
        }
      },
      "name": "nginx",
      "ready": true,
      "restartCount": 1,
      "started": true,
      "state": {
        "running": {
          "startedAt": "2021-02-08T11:16:28Z"
        }
      }
    }
  ],
  "hostIP": "x.x.x.x",
  "phase": "Running",
  "podIP": "10.1.239.205",
  "podIPs": [
    {
      "ip": "10.1.239.205"
    }
  ],
  "qosClass": "BestEffort",
  "startTime": "2021-02-08T11:11:53Z"
}

#Probes in Kubernetes

Kubernetes provides probes -health checks- to monitor and act on the state or condition of the pods, to make sure only healthy pods serve traffic.

Kubelet is the responsible component for running the health checks, updating the API Server with the relevant information.

#Probe Handlers

There are three available handlers that can cover almost any scenario.

#Exec Action

ExecAction executes a command inside the container; this also is a gateway feature that can handle anything since we can run any executable; this might be a script running several curl requests to determine the status or an executable that connects to an external dependency. Make sure that the executable does not create zombie processes.

#TCP Socket Action

TCPSocketAction Connects to a defined port to check if the port is open, mostly used for endpoints that are not talking HTTP.HTTP Get Action

HTTPGetAction sends an HTTP Get request as a probe to the path defined, HTTP response code determines whether the probe is successful or not.

#Common Probe Parameters

Each type of probe has common configurable fields:

  • initialDelaySeconds: Seconds after the container started and before probes start. (default: 0)
  • periodSeconds: Frequency of the pod. (default: 10)
  • timeoutSeconds: Timeout for the expected response. (default: 1)
  • successThreshold: How many success results received to transition from failure to a healthy state. (default: 1)
  • failureThreshold: How many failed results received to transition from healthy to failure state. (default: 3)

As you can see, we can configure probes in detail. For successful probe configuration, we need to analyze the requirements and dependencies of our application/microservice.

#Startup Probes

If your process requires time to get ready, reading a file, parsing a large configuration, preparing some data, and so on, you should use Startup Probes. If the probe fails, the threshold is exceeded, it will be restarted so the operation can start over. You need to adjust initialDelaySeconds and periodSeconds accordingly to make sure the process has sufficient time to complete. Otherwise, you can find your pod in a loop of restarts.

#Readiness Probes

If you want to control the traffic sent to the pod, you ought to use readiness probes. Readiness Probes modify Pod Conditions: Ready to change whether the pod should be included in the service and load-balancers. When the probe succeeds enough times (threshold), it means that the pod can receive traffic, and it should be included in the service and load-balancers. If your process has the ability to take itself out of the service for maintenance, reading a large amount of data to be used for the service, etc., again, you ought to use readiness probes. So that pod can signal to kublet via readiness probe that it wants out of the service for a while.

#Liveness Probes

If your container cannot crash by itself when there is an unexpected error occur, then use liveness probes. Using liveness probes can overcome some of the bugs the process might have. Kublet restarts the pod once the Liveness Probe fails.

If your process can handle these errors by exiting, you don’t need to use liveness probes; however, it is advantageous to accommodate unknown bugs until they are fixed.

#Example: Kubernetes API

Kubernetes API includes health check endpoints as well: healthz (deprecated), readyz, livez.

Let’s look at the readyz endpoint designed to be used with ready probes.

kubectl get --raw='/readyz?verbose'

Individual services healths are combined to show health status.

[+]ping ok
[+]log ok
[+]etcd ok
[+]informer-sync ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
[+]shutdown ok
healthz check passed

Let’s look at the livez endpoint.

kubectl get --raw='/livez?verbose'

Individual services healths are combined to show health status.

[+]ping ok
[+]log ok
[+]etcd ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
healthz check passed

#Conclusion

We have explored Kubernetes probes; they are an essential part of the high availability equation. On the other hand, it is apparent that a misconfiguration can affect our applications’/microservices’ availability adversely. It is of paramount importance to configure appropriately and test different scenarios to find the optimal values; we need to think about the stability of the external sources and whether we would include this check on the probe response endpoint. We have seen that Readiness Probe’s action is to remove or include the pod in the service and load-balancers, while the liveness probe’s action is to restart the pod on enough failures that exceed the threshold. You can find links to previous articles detailing Readiness, Liveness, and Startup Probes in the further reading section.

#Further Reading