In this section:
The platforms support 1:1 box level redundancy through the communication paths provided by the external High Availability (HA) Ethernet ports.
Types of platform redundancy include:
For more information on installing the HA pairs, see Installing SBC Application.
The (SBC 5000/7000 series, SBC SWe) includes a CLI action command to support Geographical Redundancy High Availability (GRHA) mode providing enhanced support against network degradation. GRHA mode supports active and standby servers located in two different data-centers. This is performed by changing the bond device monitoring from MII to ARP. ARP monitoring is used to detect issues when the s are connected to switches, which are connected to each other through a network. GRHA mode protects against data center and network failures. In addition to switching the bond monitoring, the leadership algorithm changes the decision on which survives split-brain recovery. The change is included in a GRHA situation since any HA link loss is primarily due to data center isolation.
This feature is not configurable during software installation nor is it changeable during an upgrade.
The platforms support Bond Monitoring which is configurable using the CLI
setHaConfig bondMonitoring command (described below) to change the bond device monitoring from MII to ARP. ARP monitoring is used to detect issues when s are connected to switches which are connected to each other through a network. GRHA mode protects against data center and network failures.
Bond link monitoring is not applicable to the platform.
When network issues prevent the s from communicating, both nodes become active (split-brain recovery). Once the communication is re-established, one of the nodes must be restarted again to become the standby. The supports an enhanced leadership algorithm which is configurable using the CLI
setHaConfig leaderElection command. This algorithm changes the decision on which the survives split-brain recovery. This functionality is included in a GRHA situation since any HA link loss is primarily due to data center isolation.
When network issues prevent the s from communicating, both nodes become active (split-brain recovery). Once the communication is reestablished, one of the nodes must be restarted again to become the standby. The enhanced Leadership Algorithm changes how the decides which node will survive as active by choosing the node that was promoted to active during the split-brain recovery. The algorithm also performs additional checks to handle situations where a node may restart or start for the first time while communication is interrupted.
All nodes must use the same algorithm. The default algorithm is used until new peer election algorithm is configured and available on both the nodes.
GRHA mode network requirements are listed below:
The HA servers can be deployed to different geographical locations to meet various disaster recovery requirements. The following table lists High Availability link delays per HA pair for each platform.
In a High Availability configuration with an active and standby (redundant) server where active server fails, switch-over is completely automatic preserving the integrity of stable calls.
A switch-over from active to redundant server can result in packet loss. Fax (and modem) calls are generally not tolerant to media interruptions. Despite the fact that some fax and modem calls may be preserved during a switch-over, it is not uncommon for fax machines and modems to terminate their transmissions as a result of a server switch-over.
Each supports two primary (active) and two secondary (standby) 10 GigE media interfaces (packet ports). The standby port functionality provides redundant port protection for each of the active media interfaces. In an HA scenario, the backup CE has its own primary and secondary packet ports. See Figure 1 for a depiction of the HA port redundancy configuration.
For a depiction of media port inter-connectivity in an HA configuration, see the Management and HA Port Connections diagram on page Connecting SBC 7000 Series Ethernet and Data Cables.
Active port: An Ethernet packet port that is currently selected for use (e.g. for signaling, media, etc.); either a primary or secondary port on an active CE.
|A port which is in the active state does not necessarily imply that is "up".|
Local standby port: A standby port on an active CE providing redundancy protection to the currently active port.
|A port's role (Primary/Secondary) is independent of the port's state (Active/Standby).|
The supports the capability to perform link detection on standby and active Ethernet ports to facilitate determining the health of standby port before initiating a switchover/failover. The intent is to allow simple connectivity checking to test the ability of to send/receive Ethernet frames, connectivity to the adjacent switch/router, and the ability of the switches/router to do basic layer 2 receiving/forwarding/sending.
The following probing mechanisms are available on the platforms:
|Probing Mechanism||SBC Platforms||Affected Ports||Purpose|
|Physical link detection|
|ICMP ping||All||Active ports only|
Checks two-way connectivity between the port and the configured destination (adjacent router) by sending ICMP Ping messages at configured intervals to the destination.
|ARP ACD/ICMPv6 NUD*||SBC 7000 only|
Standby ports only
Verifies physical media by checking two-way traffic through at least the local Ethernet interface, the cable, and the adjacent layer 2/3 switching function.
Layer 3 verification is accomplished using ARP ACD Probes (for IPv4) or Neighbor Discovery (for IPv6) mechanisms to probe an arbitrary, operator-specified target IP address on a local IP subnet, typically the address of the next-hop router (Gateway IP address). Depending on the address family (IPv4/IPv6) of the gateway IP address configured, either ARP ACD or ICMPv6 NUD probing messages are sent in such a way that explicit assignments of IP addresses to the standby ports are not required. See below for specifics on IPv4 ARP ACD requirements.
When IP Target is to 0.0.0.0 and/or “probeOnStandby” is disabled, only the physical link state between the active/standby port and the adjacent router is monitored.
* Address Resolution Protocol - Address Conflict Detection / Internet Control Message Protocol Version 6 – Neighbor Unreachability Detection
If the destination address configured is an IPv4 address, then IPv4 probing is initiated by sending ARP Probe requests and listening for the responses.
ARP Request probes are sent with:
The target is required to respond to the ARP probe with an ARP Response having an L2 unicast MAC as the DESTINATION and SOURCE. If the target replies with a GARP or ARP request in the form of a broadcast, the drops these requests due to DDOS functionality enabled in the application code.
Refer to Link Detection Group - CLI for command to disable probe functionality on the Standby port if router can not reply to the ARP probe with a unicast destination MAC address.
If the destination address configured is an IPv6 address, then IPv6 probing would be initiated using Neighbor Unreachability Detection mechanism (RFC 4861 section 7). This is based on Neighbor Solicitation and Neighbor Advertisement ICMPv6 messages.
Because these are IP packets, the needs IP addresses to send/receive them. The uses auto-generated link local IPv6 address from the current local MAC address.
Neighbor Solicitation messages are sent with:
The Neighbor Solicitation message is sent on the LAN via L2 unicast to the system with the target IP address.
The target can be expected to respond with a Neighbor Advertisement using L2 unicast. Received messages are validated per RFC 4861 section 7.1.2: Check that the S bit = 1 (solicited) and that the target address = our configured target IP address.
The may reduce the call accept rate when syncing from the active to the standby CE under full load causing some calls to get rejected with a 503 message even when the applied load is below the specified maximum call rate. This condition clears once the synchronization to the standby completes. Additionally, some calls may get rejected with a 503 message when synchronization occurs while the applied load is near the maximum specified.
The impact of a link or switch failure on is depicted in the diagram below.
Most switches, in their default behavior, forward ARP Probes without issue; however, if a switch has ARP inspection/filtering functionality enabled, that feature must not discard RFC5227 ARP Probes as “invalid” or it cannot be configured on ports connected to Media ports.
The Standby ports do not have IP addresses while in standby mode, so they cannot generate ICMP Echo Requests for Sonus Link Detection purposes. If configured to provide a similar logical connectivity check, the standby ports instead send an ARP Probe to the target IP address (see Address Context - Link Detection Group (EMA) or Link Detection Group - CLI for details with configuring Link Detection).
ARP Probes are forwarded like any other traffic by most switches in their default behavior. Some switches have features (for example, Dynamic ARP Inspection, Dynamic ARP Protection, etc.) that, when enabled, discard “invalid” ARP packets. These features may incorrectly consider ARP Probes per RFC 5227 to be “invalid” ARP packets. If such a feature is enabled, the feature must not discard RFC5227 ARP Probes as “invalid” or it cannot be configured on the switch ports connected to Media ports.
The includes the flag, probeOnStandby, for use in disabling ARP/NUD probing by Link Monitors on standby packet ports in case routers in your network do not respond correctly to the ARP probes. This scenario can lead to Link Monitor declaring itself as failed. The CLI syntax is shown below (default value of
probeOnStandby is 'enabled').
|Disabling ARP/NUD probing can possibly lead to a toggling situation since we rely only on the physical port health on the standby packet port.|
probeOnStandby flag is disabled, there is a possibility for a toggling situation when the Link Monitor on the active port detects a failure via ICMP ping to a destination, while on the standby packet port it can only use the physical health of the port. Therefore, once the port becomes standby, it can look healthy if the physical port is up; however, when it becomes active and fails to reach the configured destination, it looks unhealthy.
|The Port Redundancy Group includes a mechanism to detect a scenario where link failures begin rapidly toggling between active and standby packet port. If this scenario occurs, packet port redundancy continues for physical port failures, but not for link failures reported by the Link Monitors.|
% set addressContext <addressContext_name> linkDetectionGroup <LDG_name> linkMonitor <name> probeOnStandby <disabled | enabled>
The different aspects of redundancy performance depends on the size of the configuration. The following configuration profiles are defined to reduce the number of test combinations.
The platform supports 1:1 box level redundancy. The full HA protection can be restored after switchover and virgin standby start states.
The above time to HA protection only applies when replacing a failed chassis with completely new hardware.
To meet the redundancy performance requirements, HA connectivity between the active and standby nodes in a HA pair must meet certain delay and packet loss metrics. These metrics are the same as for the platform and are summarized in the following table.
The platform uses health-checking and hot-standby techniques to efficiently detect faults and to recover media, signaling, and management connectivity with minimal external effect regardless of the current rate or capacity loading of the system.
After a fault is detected and the system switches over, the behavior of the with respect to call and registration signaling is as follows:
Management interfaces are available within two seconds after a switch-over. Specifically, it is possible to login through the management interfaces on the activated standby within two seconds.
The media recovery time depends on the failure mode and whether the calls are pass-through or transcoded. The following table shows the worst-case media recovery time for different conditions:
The may reduce the call acceptance rate when syncing from the active to the standby CE under full load causing some calls to get rejected with a 503 message even when the applied load is below the specified maximum call rate. This condition clears once the synchronization to the standby completes. Additionally, some calls may get rejected with a 503 message when synchronization occurs while the applied load is near the maximum specified.
The following table lists the maximum time for the to fail over to Local-Standby Port after a hardware port or link detection failure.