High-Availability for loft

Enterprise Feature

Running loft in a high-availability mode is an enterprise feature. Please make sure your license permits running loft in high-availability mode before you follow this guide.

By default, loft will run in a cluster as a single replica without high availability.

Enable High-Availability and run loft with multiple replicas

To configure loft to be highly available, you only have to set the environment variable LEADER_ELECTION_ENABLED to true on the loft deployment. Then you can then scale up the replicas and loft will start with leader election mode. You can do that via helm on an existing loft installation:

# Run loft with 3 replicas
helm upgrade loft loft --repo https://charts.loft.sh/ \
--namespace loft \
--reuse-values \
--set-string env.LEADER_ELECTION_ENABLED=true \
--set replicaCount=3

Now check that 3 replicas are running:

$ kubectl get po -n loft
NAME READY STATUS RESTARTS AGE
loft-84bfdb746c-7t922 1/1 Running 0 40s
loft-84bfdb746c-jpwlz 1/1 Running 0 40s
loft-84bfdb746c-9c4jx 1/1 Running 0 40s

How does it work?

To understand how loft can be made high available and scaled to multiple replicas, we have to take a look at the core components of loft:

  1. Api Server: loft starts an internal kubernetes api server that handles most of the functional requests loft receives. Since loft does not store any data in any volumes and rather saves everything in kubernetes resources, this component can be easily scaled horizontally.

  2. Controller: a significant part of loft is reacting on kubernetes resource changes and changing other kubernetes resources in return. This part is very similar to the kube-controller-manager which executes the core control loops for kubernetes. Running multiple instances of this component would result in conflicting changes and race conditions among the controllers, this is why leader-election is used. Leader election allows running multiple instances where a single leader is elected that executes the actual functionality, while all other instances are waiting. As soon as the leader fails for any reason another replica takes over.

Since for simplicity and performance reasons both components are compiled into the same binary in loft, scaling of loft works a little bit different than with kuberentes components. The difference is that loft uses a combination of leader election and simple horizontal scaling to ensure high-availability. Essentially loft uses leader election to determine which replica is a leader and which a non leader and depending on which starts in a different mode:

  1. Leader Mode: If a loft replica runs as a leader, all functionality is started like normal, so both api server and controllers are running.
  2. Non-Leader Mode: If a loft replica runs as a non leader, all functionality except the controllers are started, so only the api server is running. If the leader fails for any reason and this replica is made the new leader, loft simply starts the controller component within this replica.

If configured, loft determines automatically which replica to run in leader mode and which to run in non-leader mode. With this approach it is possible to run multiple instances of the api server and allow horizontal scaling for that component, while ensuring the controllers are only run once.