eviden-logo

Evidian > Products > High Availability Software - Zero Extra Hardware > What is RPO and RTO with examples?

What is RPO and RTO with examples?

Evidian SafeKit

What is RPO and RTO with examples of high availability and backup solutions?

Overview

This article explores RPO (Recovery Point Objective) and RTO (Recovery Time Objective) with examples of high availability and backup solutions.

What is RPO and RTO with examples?

High availability and backup solutions are complementary. The first is for automatic failover in the event of a failure and the second is for data recovery in the event of a disaster such as ransomware encrypting all data.

The article explains in detail the RTO and RPO of SafeKit, a high availability software product.

What is RPO?

RPO (Recovery Point Objective) reflects the data loss in the event of a failure.

If you are looking for a high availability cluster with automatic failover, then the RPO should be 0. The application is thus restarted without data loss. Either you can choose a hardware high availability cluster with shared disk. Or you can choose a software high availability cluster with synchronous real-time replication to have 0 data loss.

If you are implementing backup solutions, then the RPO is greater than 0 and the recovery is not automatic. Administrators decide how often to replicate and how many backups to keep.

What is RTO?

RTO (Recovery Time Objective) is the time during which an application is unavailable in the event of a failure.

For a critical application, RTO should be minimal. For this, a high availability solution is necessary with automatic restart of the application in the event of hardware or software failures. RTO is then approximatively one minute: the detection time plus the automatic restart time of the application.

With a backup solution, RTO is generally greater than several hours. Administrators will first attempt to repair the hardware and restart the application on up-to-date data. Restarting from a backup is the last decision when previous actions don't work, because it leads to data loss.

RTO with the example of a SafeKit mirror cluster

The SafeKit mirror cluster is a software high availability cluster with synchronous real-time data replication and automatic application failover.

RTO of the SafeKit mirror cluster is in the order of 1 mn and can be decreased if you configure the heartbeat timeout.

For a hardware failure, RTO = heartbeat timeout (default 30 s) + time to restart the application.

For a software failure or an administrator restart, RTO = time to stop the application + time to restart it.

With solutions that reboot a full virtual machine in case of failure, the RTO includes the reboot time of the virtual machine.

RTO with the example of a SafeKit farm cluster

The SafeKit farm cluster is a software high availability cluster with network load balancing and automatic failover.

RTO of a SafeKit farm cluster is in the order of a few seconds.

For a hardware failure, RTO = failure detection timeout through monitoring channels (default a few seconds). After the timeout the load balancing filters are reconfigured.

For a software failure or an administrator restart, RTO = time to stop the application + time to restart it.

RPO with the example of a SafeKit mirror cluster

RPO of the SafeKit mirror cluster is 0 as the replication is synchronous and real-time.

Be careful, with asynchronous replication, RPO is not 0 and there is data loss in case of failure when the application restarts on the secondary server.

RPO with the example of a SafeKit farm cluster

N/R. A farm cluster does replicate any data.

What are the advantages of a mirror cluster?

  • Low Complexity
  • Plug&Play deployment with no specific skills
  • Suitable for large deployments in many sites (very simple to deploy)
  • 2 physical or virtual nodes
  • No shared storage requirement
  • No Domain Controller requirement
  • Same solution on Windows and Linux
  • Support Windows Server and Client OS editions
  • Well documented API and support
  • Synchronous data replication (no data loss in case of failure)
  • Replicated directories can be in the system disk
  • Supports multiple heartbeats and vitual IP addresses
  • Offers configurable software, hardware and network checkers
  • For the split brain problem and the quorum, does not require a special disk or a third machine or a dedicated link between both servers
  • Automatic failover of application with a recovery time in the order of one minute
  • Automatic failback when a server comes back after a failure (no manual operation)
  • A very simple console to deploy the solution and to maintain it afterwards for end-customer
  • Supports hardware and environment failures (20% of causes of unavailability), including the complete failure of a computer room with 2 nodes in two remote sites
  • Supports software failures (40% of causes of unavailability): software bug, regression on software update (N and N+1 versions can coexist)
  • Supports human errors (40% of causes of unavailability) : the simplicity of use avoids the administration error of the critical application

What are the advantages of a farm cluster

  • Low Complexity
  • Plug&Play deployment with no specific skills
  • Suitable for large deployments in many sites (very simple to deploy)
  • 2 physical or virtual nodes or more
  • No network load balancers requirement
  • No proxy server requirement (above the farm cluster)
  • No Domain Controller requirement
  • No restriction in VMware due to multicast or unicast address
  • Same solution on Windows and Linux
  • Support Windows Server and Client OS editions
  • Well documented API and support
  • Supports multiple monitoring channels on multiple networks for server failure detection
  • Supports multiple vitual IP addresses
  • Offers configurable software, hardware and network checkers
  • Offers the mirror cluster with synchronous real-time replication and failover to implement a farm+mirror 3-tiers architecture
  • Automatic failover with a recovery time in the order of a few seconds
  • Automatic failback when a server comes back after a failure (no manual operation)
  • A very simple console to deploy the solution and to maintain it afterwards for end-customer
  • Supports hardware and environment failures (20% of causes of unavailability), including the complete failure of a computer room with 2 nodes in two remote sites
  • Supports software failures (40% of causes of unavailability): software bug, regression on software update (N and N+1 versions can coexist)
  • Supports human errors (40% of causes of unavailability): the simplicity of use avoids the administration error of the critical application

SafeKit Solutions and Quick Installation Guides

New application (real-time replication and failover)


New application (network load balancing and failover)


Database (real-time replication and failover)


Web (network load balancing and failover)


Full VM or container real-time replication and failover


Amazon AWS


Google GCP


Microsoft Azure


Other clouds


Physical security (real-time replication and failover)


Siemens (real-time replication and failover)


SafeKit High Availability Differentiators