Shared nothing architecture vs shared disk architecture
Evidian SafeKit
Shared nothing architecture vs shared disk architecture for high availability clusters
Overview
This article explores the pros and cons of shared nothing architecture vs shared disk architecture for high availability clusters. We are looking at hardware constraints, impact on application data organization, recovery time, simplicity of implementation.
The following comparative tables explain in detail the difference between shared disk architecture and SafeKit, a software clustering product implementing a shared nothing architecture.
What is a shared disk architecture?
A shared disk architecture (like with Microsoft failover cluster) is based on 2 servers sharing a disk with an automatic application failover in case of hardware of software failures.
This architecture has hardware constraints: the specific external shared storage, the specific cards to install inside the servers, and the specific switches between the servers and the shared storage.
A shared disk architecture has a strong impact on the organization of application data. All application data must be localized in the shared disk for a restart after a failover.
Moreover, on failover, the file system recovery procedure must be executed on the shared disk. This increases the recovery time (RTO).
Finally, the solution is not easy to configure because skills are required to configure the specific hardware. Additionally, application skills are required to configure application data in the shared disk.
What is a shared nothing architecture ?
A shared nothing architecture (like with SafeKit) is based on 2 servers replicating data in real-time with an automatic application failover in case of hardware of software failures.
There are two types of data replication: byte level file replication vs block level disk replication. We consider here byte level file replication because it has many advantages against block level disk replication.
The shared nothing architecture has no hardware constraints: the servers can be physical or virtual with any type of disk organization. Real-time file replication (synchronous for having 0 data loss) is made through the standard network between servers.
This architecture has no impact on application data organization. For instance, if an application has its data in the system disk, real-time file replication is working.
Recovery time (RTO) in the event of a failover is reduced to the application restart time on the secondary server's replicated files.
Finally, the solution is very simple to configure as only the paths of directories to replicate are configured.
Pros and cons of shared nothing architecture vs shared disk architecture
Shared nothing architecture
|
Shared disk architecture
|
Product |
|
Clustering toolkit for shared disk |
|
Extra hardware |
|
No - Use internal disks of servers |
Yes - Extra cost with a shared bay of disks |
Application data organization |
|
0 impact on application data organization with SafeKit. Just define directories to replicate in real-time. Even directories inside the system disk can be replicated. |
Impact on application data organization. Special configuration of the application to put its data in a shared disk. Data in the system disk cannot be recovered. |
Complexity of deployment |
|
No - install a software on 2 servers |
Yes - require specific IT skills to configure OS and shared disk |
Failover |
|
Just restart the application on the second server. |
Switch the shared disk. Remount the file system. Pass the recovery procedure on the file system. And then restart the application. |
Disaster revovery |
Just put the 2 servers in 2 remotes sites connected by an extended LAN. |
Extra cost with a second bay of disks. Specific IT skills to configure mirroring of bays across a SAN. |
Quorum and split brain |
|
Application executed on a single server after a network isolation (split brain). Coherency of data after a split brain. No need for a third machine or a quorum disk or a special heartbeat line for split brain. |
Require a special quorum disk or a third quorum server to avoid data corruption on split brain |
Suited for |
|
Software editors which want to add a simple high availability option to their application |
Enterprise with IT skills in clustering and with large database applications |
VM HA with the SafeKit Hyper-V or KVM module | Application HA with SafeKit application modules |
SafeKit inside 2 hypervisors: replication and failover of full VM | SafeKit inside 2 virtual or physical machines: replication and failover at application level |
Replicates more data (App+OS) | Replicates only application data |
Reboot of VM on hypervisor 2 if hypervisor 1 crashes Recovery time depending on the OS reboot VM checker and failover (Virtual Machine is unresponsive, has crashed, or stopped working) |
Quick recovery time with restart of App on OS2 if crash of server 1 Around 1 mn or less (see RTO/RPO here) Application checker and software failover |
Generic solution for any application / OS | Restart scripts to be written in application modules |
Works with Windows/Hyper-V and Linux/KVM but not with VMware | Platform agnostic, works with physical or virtual machines, cloud infrastructure and any hypervisor including VMware |
SafeKit with the Hyper-V module or the KVM module | Microsoft Hyper-V Cluster & VMware HA |
No shared disk - synchronous real-time replication instead with no data loss | Shared disk and specific extenal bay of disk |
Remote sites = no SAN for replication | Remote sites = replicated bays of disk across a SAN |
No specific IT skill to configure the system (with hyperv.safe and kvm.safe) | Specific IT skills to configure the system |
Note that the Hyper-V/SafeKit and KVM/SafeKit solutions are limited to replication and failover of 32 VMs. | Note that the Hyper-V built-in replication does not qualify as a high availability solution. This is because the replication is asynchronous, which can result in data loss during failures, and it lacks automatic failover and failback capabilities. |
Evidian SafeKit mirror cluster with real-time file replication and failover |
|
3 products in 1 More info > |
|
Very simple configuration More info > |
|
Synchronous replication More info > |
|
Fully automated failback More info > |
|
Replication of any type of data More info > |
|
File replication vs disk replication More info > |
|
File replication vs shared disk More info > |
|
Remote sites and virtual IP address More info > |
|
Quorum and split brain More info > |
|
Active/active cluster More info > |
|
Uniform high availability solution More info > |
|
RTO / RPO More info > |
|
Evidian SafeKit farm cluster with load balancing and failover |
|
No load balancer or dedicated proxy servers or special multicast Ethernet address More info > |
|
All clustering features More info > |
|
Remote sites and virtual IP address More info > |
|
Uniform high availability solution More info > |
|
Software clustering vs hardware clustering More info > |
|
|
|
Shared nothing vs a shared disk cluster More info > |
|
|
|
Application High Availability vs Full Virtual Machine High Availability More info > |
|
|
|
High availability vs fault tolerance More info > |
|
|
|
Synchronous replication vs asynchronous replication More info > |
|
|
|
Byte-level file replication vs block-level disk replication More info > |
|
|
|
Heartbeat, failover and quorum to avoid 2 master nodes More info > |
|
|
|
Virtual IP address primary/secondary, network load balancing, failover More info > |
|
|
|
Video content
This video first illustrates the work to be done with a shared disk architecture when the two servers of a high availability cluster must be placed on two remote sites.
Next, the video demonstrates the same use case with the SafeKt shared nothing architecture.
New application (real-time replication and failover)
- Windows (mirror.safe)
- Linux (mirror.safe)
New application (network load balancing and failover)
Database (real-time replication and failover)
- Microsoft SQL Server (sqlserver.safe)
- PostgreSQL (postgresql.safe)
- MySQL (mysql.safe)
- Oracle (oracle.safe)
- MariaDB (sqlserver.safe)
- Firebird (firebird.safe)
Web (network load balancing and failover)
- Apache (apache_farm.safe)
- IIS (iis_farm.safe)
- NGINX (farm.safe)
Full VM or container real-time replication and failover
- Hyper-V (hyperv.safe)
- KVM (kvm.safe)
- Docker (mirror.safe)
- Podman (mirror.safe)
- Kubernetes K3S (k3s.safe)
Amazon AWS
- AWS (mirror.safe)
- AWS (farm.safe)
Google GCP
- GCP (mirror.safe)
- GCP (farm.safe)
Microsoft Azure
- Azure (mirror.safe)
- Azure (farm.safe)
Other clouds
- All Cloud Solutions
- Generic (mirror.safe)
- Generic (farm.safe)
Physical security (real-time replication and failover)
- Milestone XProtect (milestone.safe)
- Nedap AEOS (nedap.safe)
- Genetec SQL Server (sqlserver.safe)
- Bosch AMS (hyperv.safe)
- Bosch BIS (hyperv.safe)
- Bosch BVMS (hyperv.safe)
- Hanwha Vision (hyperv.safe)
- Hanwha Wisenet (hyperv.safe)
Siemens (real-time replication and failover)
- Siemens Siveillance suite (hyperv.safe)
- Siemens Desigo CC (hyperv.safe)
- Siemens Siveillance VMS (SiveillanceVMS.safe)
- Siemens SiPass (hyperv.safe)
- Siemens SIPORT (hyperv.safe)
- Siemens SIMATIC PCS 7 (hyperv.safe)
- Siemens SIMATIC WinCC (hyperv.safe)