Azure ExpressRoute – Part 4

This is the fourth and final part of the multi-part series of articles where we’re going to review some of the most important aspects of Microsoft Azure ExpressRoute service.

In part 3 of this article series, we looked at the high-availability (HA) and disaster recovery (DR) concepts of ExpressRoute circuit(s). Finally, it’s time to look at the implementation details of an ExpressRoute circuit.

ExpressRoute Gateway (ER Virtual Network Gateway)

To connect your Azure virtual network and your on-premises network via ExpressRoute, you must create a virtual network gateway first. A virtual network gateway serves two purposes:

  • Exchange / advertise IP routes between the networks (Azure and on-prem)
  • Routes network traffic

Important: The ExpressRoute gateway will advertise the Address Space(s) of the entire Azure VNet – we can’t include/exclude at the subnet level. It is always the VNet Address Space that is advertised. Also, if VNet Peering is used and the peered VNet has “Use Remote Gateway” enabled, the Address Space of the peered VNet will also be advertised.

A virtual network gateway is composed of two or more VMs that are deployed to a specific subnet named as GatewaySubnet. Virtual network gateway VMs contain routing tables and run specific gateway services. These VMs are created when the virtual network gateway is deployed and can’t be created directly.

When deploying a virtual network gateway, a gateway setting needs to be configured that specifies the gateway type:

  • Vpn” specifies that the type of virtual network gateway created is a “VPN gateway”
  • ExpressRoute” means that its an ExpressRoute type.

ExpressRoute Gateway Naming and Sizing

The ER gateway requires a dedicated subnet named GatewaySubnet. Naming the gateway subnet GatewaySubnet lets Azure know that this is the subnet to deploy the virtual network gateway VMs and services to. This subnet contains the IP addresses that the gateway VMs and services use.

Note: Never deploy anything else (for example, additional VMs) to the gateway subnet – though deploying VPN Gateway and ExpressRoute Gateway in same subnet is supported as detailed here.

 

When planning the gateway subnet size, consider you particular scenario. For example, the ExpressRoute/VPN Gateway coexist configuration requires a larger gateway subnet than most other configurations. Additionally, you may want to make sure your gateway subnet contains enough IP addresses to accommodate possible future additional configurations. While you can create a gateway subnet as small as /29, Microsoft recommends that you create a gateway subnet of /27 or larger (/27, /26 etc.), if you have the available address space to do so. If you plan on connecting 16 ExpressRoute circuits to your gateway, you must create a gateway subnet of /26 or larger.

 

Important: User-defined routes with a 0.0.0.0/0 (default route) destination and NSGs on the GatewaySubnet are not supported.

ExpressRoute Gateway SKU and Availability

When creating a virtual network gateway, you need to specify the gateway SKU that you want to use. Higher gateway SKU means more CPUs and network bandwidth are allocated to the gateway, and as a result, the gateway can support higher network throughput to the virtual network.

Following table lists the type of ExpressRoute virtual network gateway SKUs available in Azure and the features each one offers:

ER Gateway SKU
VPN GW and ER GW Coexistence
Max. # of ER Circuits Linked to a VNet
Standard (older) /ERGw1Az (latest)

No

4 (Ckts from same Peering Location)

High Performance (older) /ERGw2Az (latest)

Yes

8 (Ckts from different Peering Locations)

Ultra Performance (older) /ErGw3Az (latest)

Yes

16 (Ckts from different Peering Locations)

Microsoft has published some guidance to help customers pick the right ExpressRoute virtual network gateway based on their specific requirements Estimated Performance by ER Gateway SKU.

 

Note: All the “latest” SKUs support zone redundant virtual network gateway deployments. Also, only the latest virtual network gateways support Standard, static Public IP (which is a zone redundant Public IP deployment in Azure).

Important: A lower ExpressRoute gateway SKU can be promoted to a more powerful gateway SKU by using the “Resize-AzVirtualNetworkGateway” PowerShell cmdlet. This will work for upgrades to Standard and HighPerformance SKUs. However, to upgrade to the UltraPerformance SKU, the gateway needs to be recreated. Recreating a gateway incurs downtime.

 

An Azure Availability Zone in an Azure region is a combination of a fault domain and an update domain. When designing for zone-redundant deployments, it is recommended to configure zone-redundant ExpressRoute Virtual Network Gateways to which the ExpressRoute private peering is terminated.

Deploying zone-redundant virtual network gateways in Azure Availability Zones brings resiliency, scalability, and higher availability to virtual network gateways. It physically and logically separates gateways within an Azure region, while protecting the on-premises network connectivity to Azure from zone-level failures. This is visually depicted in the image below:

Following screenshot shows the details to select when creating a zone-redundant ExpressRoute virtual network gateway:

Note: A VNet can have only one virtual network gateway per gateway type – one virtual network gateway that uses -GatewayType Vpn, and one that uses -GatewayType ExpressRoute. Also, the ER and VPN gateways can be collocated, however, it’s a good design practice to separate them out from redundancy and sizing perspective.

Public IP SKU for Zone-redundant ER Gateway

Zone-redundant gateways and zonal (single zone) gateways both rely on the Azure public IP resource Standard SKU. The configuration of the Azure public IP resource determines whether the gateway is zone-redundant, or zonal. When a public IP resource with a Basic SKU is used, the gateway will not have any zone redundancy, and the gateway resources will be regional. In other words, a zone-redundant virtual network gateway requires a Standard Public IP SKU.

 

Important: As a good deployment practice, create the Public IP address to be used for virtual network gateway in advance (before deploying the gateway itself) and use the Standard Public IP SKU with zone-redundancy:

Note: The Public IP address resource is used for internal management only and does not constitute a security exposure of your virtual network.

ExpressRoute Failure Detection Time and MTTR

ExpressRoute supports Bidirectional Forwarding Detection (BFD) over private peering. BFD reduces detection time of failure over the Layer 2 network between Microsoft Enterprise Edge (MSEEs) and their BGP neighbors on the on-premises side from about 3 minutes (default) to less than a second.

This is an on-prem side configuration and needs to be performed on Customer Edge (CE) routing devices or Partner Edge (PE) routing devices depending on the ExpressRoute connectivity arrangement type with the ER provider:

  • If its layer-2 type: Configure BFD on the Customer Edge routing devices

 

  • If its layer-3 type: Configure BFD on the Partner Edge routing devices

For details on how to configure BFD, refer here.

 

Note: The default hold time is 180 and the keep-alive messages are sent every 60 seconds. These are fixed settings on the Microsoft side that cannot be changed. It is possible for to configure different timers, and the BGP session parameters will be negotiated accordingly.

Routing and Traffic Control over ExpressRoute

When interconnecting the on-prem data center(s) network using more than one ExpressRoute circuit, parallel paths are introduced between them. Parallel paths, when not properly architected, could lead to asymmetrical routing. When there are stateful entities (for example, NAT, firewall) in the path, asymmetrical routing could block traffic flow. Typically, over the ExpressRoute private peering path stateful entities such as NAT or Firewalls are not encountered. That’s why, asymmetrical routing over ExpressRoute private peering doesn’t necessarily block traffic flow. However, when traffic is load balance (active-active config) across geo-redundant parallel paths, regardless of whether there are stateful entities or not, inconsistent network performance may be experienced. These geo-redundant parallel paths can be through the same metro or different metro. The Issue and its potential solutions have been discussed in section “ExpressRoute Circuit(s) Availability” earlier in this article.

Note: Asymmetrical routing occurs when network devices / firewalls that perform stateful packet inspection block return traffic that follows a different path than the outbound packets followed.

 

Another important design decision in traffic routing space is whether to allow traffic directly out of Azure or have it tunnel back on-prem through the existing Internet breakout of the organisation. Important drivers in this decision are:

  • Whether the organization wants distributed or centralized traffic control (policy application, regulatory, and compliance requirements).
  • Whether the existing on-prem infrastructure is in-line and capable of supporting the current and future requirements of the Azure traffic.
  • Future envisioning and plans to explore cloud native solutions.

Configuring an ER Circuit – High-level Steps

Here’s a quick summary of end-to-end steps involved in planning and deploying ExpressRoute circuit(s):

  • Decide on various aspects of ExpressRoute (Provider, Peering location, SKU, data plan, bandwidth, and any add-on features).
  • Order ExpressRoute circuit(s) with identified provider(s).
  • Plan IP Schema – subnets and VLAN IDs to be used for creation of ER Peering (Private and Microsoft).
  • Plan the on-prem side of the connectivity (what device port the ER is going to termite on, whether there’s enough ports / interfaces available to setup ER in HA for single data center and similar config for redundant ER, dedicated firewall device needed?)
  • Create a Public IP (Standard SKU) in Azure
  • Setup an ExpressRoute Gateway in Azure
  • Create ExpressRoute circuit(s) object in Azure
  • Send the Service Key to the ER Provider
  • Customer or the provider (depending on whether it’s Layer-2 or Layer-3 type connectivity arrangement) must configure the BGP peering(s).
  • Advertise route prefixes from on-prem side. The address prefixes advertised from on-premises to Azure over the ExpressRoute peering must not be advertised through the Internet link of the organization – this causes asymmetric loops and stateful devices (like Firewall) tend to drop the packets of a flow that wasn’t established through them.
  • Link the Azure Virtual Network to the ExpressRoute circuit.
  • Optimize the routing when using redundant ExpressRoute circuits.
  • Ensure that the on-premises network also prefer the same ExpressRoute path for Azure bound traffic to avoid asymmetric flows.
  • Enable Bi-directional Forwarding Detection on your Customer Edge (CE)/ Partner Edge (PE) (on both, primary and secondary devices).
  • Test failover from Primary ER to Secondary ER.

Further Recommended Reading

As stated in the beginning, there’s already a lot of great documentation available on ER, without which it would not have been possible to bring these details together. I keep referring to those details even today, as that’s the official stuff. Some of the recommended ones are:

 

 

 

That’s it for this time. I hope this was useful and you enjoyed as much as I did. See you next time.

 

At Eighty20 Solutions, our goal is to deliver technology transformations in a faster, simpler, and more collaborative manner working with our clients. If you are looking at a cloud journey and are looking at partners who get in the trenches, work shoulder to shoulder with your team, and stay the course, while you help your organisation to sustain long-term, strategic technology investments, embrace change, and realise benefits – as opposed to leaving the teams grappling with shiny new technical debt – reach out to us today

Get in touch to find out how we can help you focus on what matters.