Difference between High Availability and Fault Tolerance
Evidian SafeKit
What is the difference between high availability and fault tolerance?
Overview
This article explores the pros and cons of a high availability cluster versus a fault tolerant system by looking at hardware constraints, software failures, RTO, RPO...
The following comparative tables explain in detail the difference between a fault tolerant system and SafeKit, a software high availability cluster.
What is high availability?
A high availability cluster is based on two servers with restart of the critical application in the event of hardware or software failures. There are 2 types of clusters: hardware clusters and software clusters.
Hardware clusters are based on shared disks resulting in dependencies between servers and their connections to shared disk arrays.
Software clusters like Evidian SafeKit are based on real-time data replication and are hardware-agnostic: they can be deployed on physical or virtual servers or in the cloud.
What is fault tolerance?
A fault tolerant system relies on either specialized hardware or specialized hypervisor to detect a hardware failure and instantly switch to a redundant hardware component without application restart.
Fault-tolerant systems only deal with hardware failures and not software failures, by far the most common reason for system downtime.
Pros and cons of high availability and fault tolerance
Software high availability cluster | Fault-tolerant system |
Product | |
SafeKit on Windows and Linux | Fault tolerant products |
Hardware / hypervisor | |
No dedicated server, no dedicated hypervisor.
Works with the standard and free hypervisor of Windows, Hyper-V, included in Windows kernel for servers and PCs. Works with the standard and free hypervisor KVM (Kernel-based Virtual Machine) integrated in mainline Linux kernel. Each server can be the failover server of the other one for multiple applications. |
Dedicated hardware or dedicated hypervisor.
The secondary server is dedicated to the execution of the same application synchronized at the instruction level. |
Software failure | |
Software failure supported with restart in another OS environment. | Software exception on both servers at the same time on the same OS. |
Smooth upgrage/fix of application and OS | |
Yes
Smooth upgrade/fix of application and OS possible server by server. N and N+1 versions can coexist. |
No
Same application and OS image on both servers. |
RTO/RPO | |
The recovery time with SafeKit (RTO) depends on the time to detect and to restart the application (about 1 minute).
The data loss with SafeKit (RPO) is zero as the replication is synchronous. |
The recovery time (RTO) of a fault tolerant system is zero.
The application is not restarted in case of failure and continue its execution on the secondary server. The data loss (RPO) is also zero. |
Flexibility | |
Can run on any type of server with standard Windows and Linux OS | Depends on specific hardware or on specific hypervisors |
Suited for | |
Software editors which want to add a simple high availability option to their application | Environment where hardware failures is the main concern |
VM HA with the SafeKit Hyper-V or KVM module | Application HA with SafeKit application modules |
SafeKit inside 2 hypervisors: replication and failover of full VM | SafeKit inside 2 virtual or physical machines: replication and failover at application level |
Replicates more data (App+OS) | Replicates only application data |
Reboot of VM on hypervisor 2 if hypervisor 1 crashes Recovery time depending on the OS reboot VM checker and failover (Virtual Machine is unresponsive, has crashed, or stopped working) |
Quick recovery time with restart of App on OS2 if crash of server 1 Around 1 mn or less (see RTO/RPO here) Application checker and software failover |
Generic solution for any application / OS | Restart scripts to be written in application modules |
Works with Windows/Hyper-V and Linux/KVM but not with VMware | Platform agnostic, works with physical or virtual machines, cloud infrastructure and any hypervisor including VMware |
SafeKit with the Hyper-V module or the KVM module | Microsoft Hyper-V Cluster & VMware HA |
No shared disk - synchronous real-time replication instead with no data loss | Shared disk and specific extenal bay of disk |
Remote sites = no SAN for replication | Remote sites = replicated bays of disk across a SAN |
No specific IT skill to configure the system (with hyperv.safe and kvm.safe) | Specific IT skills to configure the system |
Note that the Hyper-V/SafeKit and KVM/SafeKit solutions are limited to replication and failover of 32 VMs. | Note that the Hyper-V built-in replication does not qualify as a high availability solution. This is because the replication is asynchronous, which can result in data loss during failures, and it lacks automatic failover and failback capabilities. |
Evidian SafeKit mirror cluster with real-time file replication and failover |
|
3 products in 1 More info > |
|
Very simple configuration More info > |
|
Synchronous replication More info > |
|
Fully automated failback More info > |
|
Replication of any type of data More info > |
|
File replication vs disk replication More info > |
|
File replication vs shared disk More info > |
|
Remote sites and virtual IP address More info > |
|
Quorum and split brain More info > |
|
Active/active cluster More info > |
|
Uniform high availability solution More info > |
|
RTO / RPO More info > |
|
Evidian SafeKit farm cluster with load balancing and failover |
|
No load balancer or dedicated proxy servers or special multicast Ethernet address More info > |
|
All clustering features More info > |
|
Remote sites and virtual IP address More info > |
|
Uniform high availability solution More info > |
|
Software clustering vs hardware clustering More info > |
|
|
|
Shared nothing vs a shared disk cluster More info > |
|
|
|
Application High Availability vs Full Virtual Machine High Availability More info > |
|
|
|
High availability vs fault tolerance More info > |
|
|
|
Synchronous replication vs asynchronous replication More info > |
|
|
|
Byte-level file replication vs block-level disk replication More info > |
|
|
|
Heartbeat, failover and quorum to avoid 2 master nodes More info > |
|
|
|
Virtual IP address primary/secondary, network load balancing, failover More info > |
|
|
|
New application (real-time replication and failover)
- Windows (mirror.safe)
- Linux (mirror.safe)
New application (network load balancing and failover)
Database (real-time replication and failover)
- Microsoft SQL Server (sqlserver.safe)
- PostgreSQL (postgresql.safe)
- MySQL (mysql.safe)
- Oracle (oracle.safe)
- MariaDB (sqlserver.safe)
- Firebird (firebird.safe)
Web (network load balancing and failover)
- Apache (apache_farm.safe)
- IIS (iis_farm.safe)
- NGINX (farm.safe)
Full VM or container real-time replication and failover
- Hyper-V (hyperv.safe)
- KVM (kvm.safe)
- Docker (mirror.safe)
- Podman (mirror.safe)
- Kubernetes K3S (k3s.safe)
Amazon AWS
- AWS (mirror.safe)
- AWS (farm.safe)
Google GCP
- GCP (mirror.safe)
- GCP (farm.safe)
Microsoft Azure
- Azure (mirror.safe)
- Azure (farm.safe)
Other clouds
- All Cloud Solutions
- Generic (mirror.safe)
- Generic (farm.safe)
Physical security (real-time replication and failover)
- Milestone XProtect (milestone.safe)
- Nedap AEOS (nedap.safe)
- Genetec SQL Server (sqlserver.safe)
- Bosch AMS (hyperv.safe)
- Bosch BIS (hyperv.safe)
- Bosch BVMS (hyperv.safe)
- Hanwha Vision (hyperv.safe)
- Hanwha Wisenet (hyperv.safe)
Siemens (real-time replication and failover)
- Siemens Siveillance suite (hyperv.safe)
- Siemens Desigo CC (hyperv.safe)
- Siemens Siveillance VMS (SiveillanceVMS.safe)
- Siemens SiPass (hyperv.safe)
- Siemens SIPORT (hyperv.safe)
- Siemens SIMATIC PCS 7 (hyperv.safe)
- Siemens SIMATIC WinCC (hyperv.safe)