eviden-logo

Evidian > Products > High Availability Software - Zero Extra Hardware > KVM cluster without shared storage on a SAN

KVM cluster without shared storage on a SAN

[SafeKit] Synchronous real-time replication, high availability and migration of virtual machines between two servers

How the Evidian SafeKit software simply implements a KVM high availability cluster without shared storage on a SAN?

The solution for KVM

Evidian SafeKit brings high availability to KVM between two servers of any brand.

This article explains how to implement quickly a KVM cluster without shared storage on a SAN and without specific skills.

The principle of the solution is to put a critical application in a virtual machine under KVM. SafeKit implements real-time replication and automatic failover of the virtual machine.

Note that KVM is the free hypervisor included in all Linux versions.

A solution open to several applications

Several applications can be put in several virtual machines replicated and restarted by SafeKit. You have the possibility to migrate each virtual machine between both servers with the SafeKit console and thus balance the load in an active-active cluster.

[SafeKit] A KVM cluster without shared storage on a SAN

Save costs with this solution

There is no need for complex VMware-type solution with three servers and shared storage on a SAN or vSAN. With SafeKit, you will have instead synchronous real-time replication and failover of several virtual machines between two servers.

And with the standard virt-manager GUI, you will be able to manage very simply your virtual machines.

Note that you can implement with the SafeKit product real-time replication and failover of any file directory and service, database, complete Hyper-V or KVM virtual machines, Docker, Podman, K3S, Cloud applications (see all solutions).

How the SafeKit mirror cluster works with KVM?

The following steps are described for one virtual machine inside one mirror module. Each replicated virtual machine runs in an independent mirror module (with a maximum of 32 virtual machines) with a primary server that can be either the KVM server 1 or the KVM server 2.

Step 1. Real-time replication

Server 1 (PRIM) runs one VM. SafeKit replicates in real time the VM files (virtual hard disk, VM configuration). Only changes made in the files are replicated across the network.

File replication at byte level in a KVM mirror cluster

The replication is synchronous with no data loss on failure contrary to asynchronous replication.

You just have to configure the VM directory name in SafeKit. There are no pre-requisites on disk organization. The directory may be located in the system disk.

Step 2. Automatic failover

When Server 1 fails, Server 2 takes over. SafeKit restarts the VM on Server 2. KVM finds the files replicated by SafeKit uptodate on Server 2. 

The VM continues to run on Server 2 by locally modifying its files that are no longer replicated to Server 1.

Failover in a KVM mirror cluster

The failover time is equal to the fault-detection time (set to 30 seconds by default) plus the VM reboot time. 

Step 3. Automatic failback

Failback involves restarting Server 1 after fixing the problem that caused it to fail. SafeKit automatically resynchronizes the VM files.

Failback in a KVM mirror cluster

This reintegration takes place without disturbing the VM, which can continue running on Server 2.

Step 4. Back to normal

After reintegration, the VM files are once again in mirror mode, as in step 1. The system is back in high-availability mode, with the VM running on Server 2 and SafeKit replicating updates to Server 1.

Passive active KVM mirror cluster with data replication

If the administrator wishes the VM to run on Server 1, he/she can execute a "swap" command either manually at an appropriate time, or automatically through configuration.

Typical usage with SafeKit

Why a replication of a few Tera-bytes?

Resynchronization time after a failure (step 3)

  • 1 Gb/s network ≈ 3 Hours for 1 Tera-bytes.
  • 10 Gb/s network ≈ 1 Hour for 1 Tera-bytes or less depending on disk write performances.

Alternative

Why a replication < 1,000,000 files?

  • Resynchronization time performance after a failure (step 3).
  • Time to check each file between both nodes.

Alternative

  • Put the many files to replicate in a virtual hard disk / virtual machine.
  • Only the files representing the virtual hard disk / virtual machine will be replicated and resynchronized in this case.

Why a failover ≤ 32 replicated VMs?

  • Each VM runs in an independent mirror module.
  • Maximum of 32 mirror modules running on the same cluster.

Alternative

  • Use an external shared storage and another VM clustering solution.
  • More expensive, more complex.

Why a LAN/VLAN network between remote sites?

Alternative

  • Use a load balancer for the virtual IP address if the 2 nodes are in 2 subnets (supported by SafeKit, especially in the cloud).
  • Use backup solutions with asynchronous replication for high latency network.

SafeKit High Availability Differentiators