| |
Failover is the process of replacing a failing computing resource with a healthy resource. This can be done automatically or manually in order to maintain the uptime of services. The following are common types of failover.
SwitchoverSwitchover is the correct term for a manual failover whereby a human needs to be involved to replace a failing resource. For example, a failing virtual server on a cloud platform that needs to be released with a new instance launched from a backup using a management console.HeartbeatA form of automated failover whereby a resource sends out a regular heartbeat message to let another resources know it is still alive. If a number of heartbeats for a resources are missed, a failover is triggered. A heartbeat is a push from a resource outwards.
Health CheckA health check is a pull operation that verifies the health of a resource from the outside. For example, a load balancer may provide a health check on a server by monitoring for a failure to reply to client requests. Alternatively, the load balancer may make requests of its own to check if a resource is up.A load balancer is a tool for distributing workload to servers and other resources. These support failover using health checks or heartbeats. A load balancer may effectively achieve failover by removing a failed resource from its pool of active resources. A load balancer may also start new resources to replace the failed resource using autoscaling capabilities.
Peer FailoverPeer failover is any architecture where resources failover for each other as peers as opposed to having a centralized controller such as a load balancer perform the failover. For example, two servers can send each other heartbeats and take over the other server's work when a heartbeat fails. This can also be orchestrated across a large number of peers to create highly resilient services.
StandbysStandbys are resources that perform no work until they are needed to replace a failed resource. A hot standby is a running resource with data mirrored in real time such that failover can be achieved quickly. A cold standby is a resource that is not running. In many cases, data needs to be restored from a backup to prepare a cold standby for launch.Cloud FailoverCloud infrastructure allows resources to be scaled up and down on demand and is ideal for performing automated or manual failovers. Cloud platforms typically provide failover functionality with load balancing services, management platforms, autoscaling tools and API gateways.
Disaster RecoveryDisaster recover is the process of managing the risks of large failures due to disasters. Disasters may take entire data centers out of operation as opposed to single resources. As such, disaster recovery often requires architectures that can failover all the resources in an entire region. This can be achieved with cloud architectures that allow load balancing across multiple regions. Alternatively, it can be achieved with a hot or cold site and switchover procedures for a disaster.
FailbackFailback is the process of repairing a failed resource and putting it back to work. This is the reverse process of failover.|
Type | | Definition | The process of replacing a failing computing resource with a healthy resource. | Related Concepts | |
Reliability Engineering
This is the complete list of articles we have written about reliability engineering.
If you enjoyed this page, please consider bookmarking Simplicable.
A list of common high availability techniques.
An overview of load balancing with examples.
Common examples of edge computing.
A comparison of uptime and downtime with a chart of common targets.
A definition of service level agreement with a few examples.
A list of IT service management terms.
The difference between a SLA and a OLA.
The difference between incidents and problems explained.
The common functions of application management.
An overview of capacity management.
A definition of DevOps with an outline of its components.
The definition of IT services with examples.
An overview of patch management with examples.
An overview of process improvement with examples.
TrendingThe most popular articles on Simplicable in the past day.
Recent posts or updates on Simplicable.
Site Map
© 2010-2023 Simplicable. All Rights Reserved. Reproduction of materials found on this site, in any form, without explicit permission is prohibited.
View credits & copyrights or citation information for this page.
|