Kubernetes K3S: the simplest high availability cluster between two redundant servers

The solution for Kubernetes K3S

Evidian SafeKit brings high availability to Kubernetes K3S between two redundant servers. This article explains how to implement quickly a Kubernetes cluster on 2 nodes without NFS external storage, without an external configuration database and without specific skills.

Note that SafeKit is a generic product. You can implement with the same product real-time replication and failover of directories and services, databases, Docker, Podman, full Hyper-V or KVM virtual machines, Cloud applications (see all solutions).

This clustering solution is recognized as the simplest to implement by our customers and partners. The SafeKit solution is the perfect solution for running Kubernetes applications on premise and on 2 nodes.

We have chosen K3S as the Kubernetes engine because it is a lightweight solution for IoT & Edge computing.

The k3s.safe mirror module implements:

2 active K3S masters/agents running pods
replication of the K3S configuration database (MariaDB)
replication of persistent volumes (implemented by NFS client dynamic provisionner storage class: nfs-client)
virtual IP address, automatic failover, automatic failback

How it works?

The following table explains how the solution is working on 2 nodes. Other nodes with K3S agents (without SafeKit) can be added for horizontal scalability.

Kubernetes K3S components
SafeKit PRIM node	SafeKit SECOND node
K3S (master and agent) is running pods on the primary node	K3S (master and agent) is running pods on the secondary node
NFS Server is running on the primary node with: a virtual IP/NFS port exported NFS share K3S persistent volumes	Persistent volumes are replicated synchronously and in real-time by SafeKit on the secondary node
MariaDB server is running on the primary node with: a virtual IP/MariaDB port K3S configuration database	The configuration database is replicated synchronously and in real-time by SafeKit on the secondary node

A simple solution

SafeKit is the simplest high availabiliy solution for running Kubernetes applications on 2 nodes and on premise.

SafeKit	Benefits
Synchronous real-time replication for persistent volumes	No external NAS/NFS storage for persistent volumes
Only 2 nodes for HA of Kubernetes	No need for 3 nodes like with etcd database
Same simple product for virtual IP address, replication, failover, failback, administration, maintenance	Avoid different technologies for virtual IP (metal-lb, BGP), HA of persistent volumes, HA of configuration database
Supports disaster recovery with two remote nodes	Avoid replicated NAS storage

Step 1. File replication at byte level in a mirror cluster

This step corresponds to the following figure. Server 1 (PRIM) runs the Kubernetes K3S components explained in the previous table. Clients are connected to the virtual IP address of the mirror cluster. SafeKit replicates in real time files opened by the Kubernetes K3S components. Only changes made by the components in the files are replicated across the network, thus limiting traffic (byte-level file replication).

With a software data replication at the file level, only names of directories are configured in SafeKit. There are no pre-requisites on disk organization for the two servers. Directories to replicate may be located in the system disk. SafeKit implements synchronous replication with no data loss on failure contrary to asynchronous replication.

Step 2. Failover

When Server 1 fails, Server 2 takes over. SafeKit switches the cluster's virtual IP address and restarts the Kubernetes K3S components automatically on Server 2. The components find the files replicated by SafeKit uptodate on Server 2, thanks to the synchronous replication between Server 1 and Server 2. The components continue to run on Server 2 by locally modifying their files that are no longer replicated to Server 1.

The failover time is equal to the fault-detection time (set to 30 seconds by default) plus the components start-up time. Unlike disk replication solutions, there is no delay for remounting file system and running file system recovery procedures.

Step 3. Failback and reintegration

Failback involves restarting Server 1 after fixing the problem that caused it to fail. SafeKit automatically resynchronizes the files, updating only the files modified on Server 2 while Server 1 was halted. This reintegration takes place without disturbing the Kubernetes K3S components, which can continue running on Server 2.

If SafeKit was cleanly stopped on server 1, then at its restart, only the modified zones inside files are resynchronized, according to modification tracking bitmaps.

If server 1 crashed (power off), the modification bitmaps are not reliable and not used. All the files bearing a modification timestamp more recent than the last known synchronization point are resynchronized.

Step 4. Return to byte-level file replication in the mirror cluster

After reintegration, the files are once again in mirror mode, as in step 1. The system is back in high-availability mode, with the Kubernetes K3S components running on Server 2 and SafeKit replicating file updates to the secondary Server 1.

If the administrator wishes the Kubernetes K3S components to run on Server 1, he/she can execute a "swap" command either manually at an appropriate time, or automatically through configuration.

🔍 SafeKit High Availability Navigation Hub

Explore SafeKit: Features, technical videos, documentation, and free trial
Resource Type	Description	Direct Link
Features	Why Choose SafeKit for Simple and Cost-Effective High Availability?	View Features
Partners	SafeKit: The Benchmark in High Availability for Partners	SafeKit for Partners
VM vs App HA	SafeKit: High Availability (HA) and Redundancy Choices	VM/App Choice
Typical Usage	Typical usage with SafeKit and Limitations	Usage and Limitations
Videos	SafeKit: Technical Demonstrations and Tutorials	Watch Videos
Mirror Cluster	How the SafeKit mirror cluster works (real-time file replication and failover)?	Mirror Cluster
Farm Cluster	How the SafeKit farm cluster works (network load balancing and failover)?	Farm Cluster
Differentiators	Comparison of SafeKit with Traditional High Availability (HA) Clusters	View Benefits
Resources	SafeKit High Availability Resources, Downloads, and Documentation	Access Resources
Application Modules	SafeKit Application Module Library: Ready-to-Use Solutions	Browse Modules