23 Mar Azure ExpressRoute – Part 3
This is the third part of the multi-part series of articles where we’re going to review some of the most important aspects of Microsoft Azure ExpressRoute service.
In part 2 of this article series, we explored some of the architectural elements involved in setting-up an ExpressRoute circuit. In this article we will look the concepts of high-availability (HA) and disaster recovery (DR) of ExpressRoute circuit(s).
ExpressRoute Circuit(s) Availability
Azure ExpressRoute is an enterprise class connectivity service and is designed by Microsoft with failure in mind. However, ExpressRoute connectivity involves customer, connectivity provider, and Microsoft infrastructure, which means there are components outside Microsoft’s area of authority that must also be designed similarly, for end-to-end availability of ExpressRoute.
The following two sections discuss ER availability constructs and considerations to ensure availability of every construct:
ExpressRoute High Availability
ExpressRoute is designed for high availability to provide carrier grade private network connectivity to Microsoft cloud i.e., there is no single point of failure in an ExpressRoute circuit within “Microsoft network” – the “Microsoft network” is an important phrase here! To build redundancy into the ExpressRoute, each ExpressRoute circuit constitutes redundant (2) connections (Primary and Secondary). The physical connection (for example, an optical fiber) by the ER provider is terminated on a layer 1 (L1) device as shown in the diagram below – this part of the connectivity is sometimes also referred to as the “First mile” physical layer design:
Note: ExpressRoute can be terminated / connected to either Customer Edge (CE) routing devices (layer 2 type connectivity arrangement) or Partner Edge (PE) routing devices (managed layer 3 connection service). In layer 2 type connectivity, customers are responsible for configuring and managing routing.
The connectivity can be further broken down to visualize how the two virtual connections (redundant pair of cross connections) within each ExpressRoute circuit work to provide high availability for a single ER circuit. The two Ethernet virtual circuits are tagged with different VLAN IDs, one for the primary circuit, and one for the secondary. Those VLAN IDs are in the outer 802.1Q Ethernet header. The inner 802.1Q Ethernet header is mapped to a specific ExpressRoute routing domain / BGP sessions (shown as RED and BLUE in the following image):
As we can see, each ExpressRoute circuit consists of two connections to two Microsoft Enterprise edge routers (MSEEs) from the connectivity provider / your network edge. Microsoft requires dual BGP connection from the connectivity provider – one to each MSEE. You may choose not to deploy redundant devices / Ethernet circuits at your end. However, connectivity providers use redundant devices to ensure that your connections are handed off to Microsoft in a redundant manner. A redundant Layer 3 connectivity configuration is a requirement for Microsoft’s SLA to be valid.
Therefore, to build high availability of the end-to-end connectivity / path and to maximize the ER availability, the customer and the service provider segments of ExpressRoute circuit must also be explicitly architected for high availability. This means maintaining redundancy and avoiding single-point-of-failure (SPOF) within on-premises network, as well as within the service provider’s network. Therefore:
For each ExpressRoute Connection, plan for:
- Redundant Customer Edge (CE) devices (routers / switch, ports)
- Redundant power and cooling for the network devices
Terminating both the primary and secondary connections of an ExpressRoute circuit on the same provider device / Customer Premises Equipment (CPE), compromises high availability within the on-premises network. Additionally, configuring both the primary and secondary circuits to the same port of a CPE (either by terminating the two connections under different sub-interfaces or by merging the two connections within the partner network), forces the partner into compromising high availability on their network segment as well – this is visually depicted in the figure below:
Image below illustrates the recommended way to connect using an ExpressRoute circuit for maximizing the availability of a single ExpressRoute circuit – the primary and secondary connections of an ExpressRoute circuit are mapped on separate (Partner Edge) devices, and they’re then mapped to separate devices (Customer Edge) within the customer’s on-prem data centres:
ER connectivity is not lost if one of the cross connections fails. This redundant connection supports the network load and provide high availability within single ExpressRoute circuit.
The network on Microsoft Azure side is configured to operate the primary and secondary connections of ExpressRoute circuits in active-active mode. However, through on-prem route advertisements, customers can force the redundant connections of an ExpressRoute circuit to operate in active-passive mode. Following are the common techniques used to make one ExpressRoute path preferred over the other:
- Advertising more specific routes
- BGP AS path prepending
To improve high availability, Microsoft recommends operating both the connections of an ExpressRoute circuit in active-active mode. When the two connections operate in active-active mode, Microsoft network will load balance the traffic across the connections on per-flow basis.
Running the primary and secondary connections of an ExpressRoute circuit in active-passive mode poses the risk of both the connections failing following a failure in the active path. The common causes for failure on switching over are lack of active management of the passive connection, and passive connection advertising stale routes.
Alternatively, running the primary and secondary connections of an ExpressRoute circuit in active-active mode, results in only about half the flows failing and getting rerouted, following an ExpressRoute connection failure. Thus, active-active mode will significantly help improve the Mean Time To Recover (MTTR).
Note: During a maintenance activity or in case of unplanned events impacting one of the connections, Microsoft will prefer to use AS path prepending to drain traffic over to the healthy connection. Customers will need to ensure the traffic is able to route over the healthy path when path prepend is configured from Microsoft and required route advertisements are configured appropriately to avoid any service disruption.
Multiple (Parallel) ExpressRoute Circuits – Disaster Recovery for ER
An ExpressRoute circuit peering point / meet-me-location is pinned to a geographical location and therefore could be impacted by catastrophic failure that impacts the entire location.
There are possibilities and instances where an entire regional service (be it that of Microsoft, network service providers, customer, or other cloud service providers) gets degraded. The root cause for such regional wide service impact includes natural calamity. To cover against such failures, more than one ExpressRoute circuits can be created, each in a different peering / meet-me location, to achieve circuit-level resilience (disaster recovery). Microsoft strongly recommended that customers setup at least two ExpressRoute circuits to avoid single-points-of-failure.
When designing ExpressRoute connectivity for disaster recovery, consider the following:
- Deploy geo-redundant ExpressRoute circuits, i.e., redundant ExpressRoute circuits, each from different peering / meet-me location.
- Use diverse service provider network(s) for different ExpressRoute circuit
- Designing each of the ExpressRoute circuit for high availability (covered in the previous section)
- Terminating the different ExpressRoute circuit in different on-prem location / data center
Multiple ExpressRoute circuits from different peering locations (e.g., Sydney and Sydney2) or up to four connections from the same peering location can be connected to the same Azure VNet to provide high availability in the case that a single ER circuit becomes unavailable. In the hybrid connectivity model with redundant ExpressRoute circuits, a higher weight can be assigned to one of the local connections to prefer a specific ER circuit over other.
By default, if identical routes are advertised over multiple the ExpressRoute circuits, Azure will load-balance on-premises bound traffic across all the ExpressRoute circuits using Equal-cost multi-path (ECMP) routing. However, with the geo-redundant ExpressRoute circuits, consideration must be given to different network performances with different network paths (particularly for network latency). To get predictable and consistent network performance during normal operation, customers may want to prefer the ExpressRoute circuit that offers the minimal latency.
Customers can influence Azure to prefer one ExpressRoute circuit over another one using one of the following techniques (listed in the order of effectiveness):
- Advertising more specific route over the preferred ExpressRoute circuit compared to other ExpressRoute circuit(s).
- Configuring higher Connection Weight on the connection that links the virtual network to the preferred ExpressRoute circuit. This is an Azure side configuration and done on ExpressRoute Connection
- Advertising the routes over less preferred ExpressRoute circuit with longer AS Path (AS Path prepend)
The image below shows redundant ExpressRoute connectivity (parallel path) established between customer’s on-premises locations and a single Azure VNet in an Australia East region (For the sake of simplicity, I’ve just considered a single Azure region, however, the same principle applies when designing multi-region Azure deployment). It shows how Specific Route Advertisement helps control the traffic flow over parallel ER paths:
Because /25 is more specific, compared to /24, Azure would send the traffic destined to 10.1.11.0/24 via Primary ExpressRoute in the normal state. If both the connections of Primary ExpressRoute circuit go down, then the VNet would see the 10.1.11.0/24 route advertisement only via Secondary ExpressRoute; and therefore, the standby circuit is used in this failure state.
The Connection Weight property of the Azure ExpressRoute connection object can be used to direct Azure to route traffic over selected (preferred) path:
The default connection weight is 0. By setting the weight of the connection for Primary ExpressRoute higher (100) than that of Secondary ExpressRoute, Azure is influenced to use the Primary ExpressRoute when sending traffic back to on-prem. When a VNet receives a route prefix advertised via more than one ExpressRoute circuit, the VNet will prefer the connection with the highest weight:
If both the connections of Primary ExpressRoute go down, then the VNet would see the 10.1.11.0/24 route advertisement only via Secondary ExpressRoute; and therefore, the standby circuit is used in this failure state.
Finally, the AS Path Prepend is shown below – the route advertisement over Primary ExpressRoute indicates the default behaviour of eBGP. On the route advertisement over Secondary ExpressRoute, the on-premises network’s ASN (345 345) is prepended additionally on the route’s AS path (345). When the same route is received through multiple ExpressRoute circuits, per the eBGP route selection process, VNet would prefer the route with the shortest AS path:
If both the connections of Primary ExpressRoute go down, then the VNet would see the 10.1.11.0/24 route advertisement only via Secondary ExpressRoute. Consequentially, the longer AS path would become irrelevant, and the standby ER circuit would be used in this failure state.
When using any of the above three techniques to influence Azure to prefer one ExpressRoute over other(s), also ensure that the on-premises network prefers the same ExpressRoute path for Azure bound traffic to avoid asymmetric flows. Typically, local preference value is used to influence on-premises network to prefer one ExpressRoute circuit over others. Local preference is an internal BGP (iBGP) metric. The BGP route with the highest local preference value is preferred.
Note: Microsoft does not support any router redundancy protocols (for example, HSRP, VRRP) for high availability configurations. A redundant pair of BGP sessions per peering for high availability is only supported method.
Important: When using ExpressRoute circuits in a stand-by configuration, it is recommended to actively manage them and periodically test failover operation.