Express Route to Aviatrix Transit – Option 2

In the last blog post: Express Route to Aviatrix Transit – Option 1, we have discussed how to use BGP over IPSec as overlay from customer on-premise devices to Aviatrix Transit Gateways. This solution have these two constrains:

  • Each IPSec tunnel have 1.25G throughput limit
  • Azure only support IPSec, not GRE as tunneling protocol

For customer have larger ExpressRoute circuit such as 5Gbps or 10Gbps and above, but doesn’t have encryption requirement or on-premise devices isn’t capable IPSec, option 1 isn’t ideal. In this blog, I will discuss the architecture to connect to Aviatrix Transit and utilize the full ExpressRoute bandwidth.

In following architecture diagram:

  • Aviatrix Controller must be 6.8 and above to support Multi-Peer BGPoLAN for Azure Route Server. Azure Route Server require full-mesh peering to avoid single point of failure, which would result in black-hole in traffic flow.
  • Aviatrix Transit Gateway must have Insane Mode (High Performance Encryption HPE) enabled, as well as BGP Over LAN enabled.
    • Aviatrix Controller allows “Propagate gateway route”, only on the BGP over LAN interface subnet route table.
  • The on-premise to ExpressRoute circuit private peering is similar to Express Route to Aviatrix Transit – Option 1
  • Instead of deploying ExpressRoute Gateway (ERGW) inside of Aviatrix Transit vNet, we need to create a separate vNet to house ERGW and Azure Route Server (ARS)
    • When native vNet peering been used between Spoke to Aviatrix Transit, if ARS is in the same Aviatrix Transit vNet, traffic from spoke to on-premise will bypass Aviatrix Transit gateway, as more specific route from on-premise will be inserted by ERGW point to ERGW, where Aviatrix programs less specific RFC1918 routes point to Aviatrix Transit
    • This would apply also to HPE enabled Aviatrix Spoke, as when HPE is enabled, native vNet peering is been used as underlay to build multiple tunnels between Aviatrix Spoke Gateway to Aviatrix Transit Gateways.
    • From Aviatrix Transit vNet created a vNet peering with ARS_ERGW_VNet, and enabled use_remote_gateways. This will enable ERGW to propagate learned route to Transit vNet
    • From ARS_ERGW_VNet vNet created a vNet peering with Aviatrix Transit vNet, and enabled allow_gateway_transit.
    • vNet peering is subject to $0.01 per GB for both inbound and outbound data transfer.
  • Multi-hop eBGP is enabled between ARS and Aviatrix Transit Gateway
  • ARS requires dedicated RouteServerSubnet subnet, /27 or above, cannot have UDR or Network Security Group (NSG) attached
  • ERGW requires dedicated GatewaySubnet subnet, /27 or above, cannot have UDR or Network Security Group (NSG) attached
  • Branch to Branch must be enabled on ARS to exchange routes between ARS and ERGW
  • ARS Support 8 BGP peers, each peer support up to 1000 routes
  • ARS can only exchange up to 200 routes with ERGW
  • ARS is a route reflector, and it’s not in traffic path.
  • ARS Cost: $0.45USD/hour or $324 USD per month, and for a service that’s not in data path, it’s not cheap
  • When you create or delete an Azure Route Server from a virtual network that contains a Virtual Network Gateway (ExpressRoute or VPN), expect downtime until the operation complete. Reference Link

Aviatrix Spoke, Transit and ARS/ERGW deployment

I’m using following Terraform code to deploy Aviatrix Transit with HPE and BGP over LAN enabled, deploy Aviatrix Spoke and attach to transit, create separate vNet for ARS and ERGW, create ARS and ERGW, create vNet peering between Aviatrix Transit vNet and ARS/ERGW vNet, create BGPoverLAN connection between Aviatrix Transit Gateway and ARS

# Deploy Aviatrix Transit vNet and Transit Gateways
module "transit" {
  source  = "terraform-aviatrix-modules/mc-transit/aviatrix"
  version = "2.3.1"
  
  cloud   = "Azure"
  region  = "West US"
  cidr    = "10.0.16.0/23"
  account = "azure-test-jye"

  local_as_number          = 65001
  insane_mode              = true
  enable_bgp_over_lan      = true
  bgp_lan_interfaces_count = 1
  instance_size            = "Standard_D4_v2"
  az_support = false
  name = "transit"
  gw_name = "transit"
  resource_group = "ER-LAB"
}

# Deploy Aviatrix Spoke vNet and Spoke Gateways, then attach to Transit
module "mc-spoke" {
  source  = "terraform-aviatrix-modules/mc-spoke/aviatrix"
  version = "1.4.1"
  cloud   = "Azure"
  region  = "West US"
  cidr    = "10.0.32.0/24"
  account = "azure-test-jye"
  transit_gw = module.transit.transit_gateway.gw_name
  name = "spoke"
  az_support = false
}

# Deploy ARS and ERGW, and create BGP over LAN connection between Aviatrix Transit GW and ARS
module "azure_route_server" {
  source  = "terraform-aviatrix-modules/azure-route-server/aviatrix"
  version = "1.0.1"

  name             = "ars"
  transit_vnet_obj = module.transit.vpc
  transit_gw_obj   = module.transit.transit_gateway
  cidr             = "10.0.10.0/24"
  resource_group_name = module.transit.vpc.resource_group
}

ER Circuit, connection and on-premise router configuration

This part is very similar to Express Route to Aviatrix Transit – Option 1

  • Create ExpressRoute circuit and have it provisioned with provider
  • Create private BGP peering between ER circuit Microsoft Enterprise Edge route and on-premise device

On-premise router configuration peering with MSEE

  • GigabitEthernet0/0/0.803 169.254.80.81 is connected to on customer side primary link subnet 169.254.80.80/30
  • Loopback is created for testing connectivity from cloud, we only advertise the loopback towards ER
  • BGP session create towards MSEE 169.254.80.82, note ER always use ASN 12076
interface GigabitEthernet0/0/0.803
 description to be connected to an Azure ER circuit
 encapsulation dot1Q 803
 ip address 169.254.80.81 255.255.255.252

interface Loopback88
 ip address 192.168.88.88 255.255.255.255

router bgp 65000
 bgp log-neighbor-changes
 neighbor 169.254.80.82 remote-as 12076
 neighbor 169.254.80.82 description Express Route
 !
 address-family ipv4
  network 192.168.88.88 mask 255.255.255.255
  neighbor 169.254.80.82 activate
  neighbor 169.254.80.82 soft-reconfiguration inbound
  neighbor 169.254.80.82 prefix-list router-to-er out
  maximum-paths 8
 exit-address-family

ip prefix-list router-to-er description Advertise Loopback only
ip prefix-list router-to-er seq 10 permit 192.168.88.88/32

  • Create ER Circuit connection between the ER circuit and ExpressRoute Gateway (The ERGW was provisioned by above TF script)

If you want to create the Aviatrix Gateway BGP over LAN to ARS manually:

Note down ARS ASN and peering IP

Note down Aviatrix Transit GW BPG over LAN interface IP, note this interface may not be eth1, as if you enabled Transit FireNet which will create two additional interfaces, this could push the BGP over LAN interface to eth3

Azure Portal -> Primary Transit Gateway -> Networking -> find bgp_lan interface, and note down it’s private IP, eg: 10.0.16.68

Azure Portal -> HA Transit Gateway -> Networking -> find bgp_lan interface, and note down it’s private IP, eg: 10.0.16.76

Obtain Aviatrix Transit Gateway ASN, Multi-Cloud Transit -> Advanced -> Select Transit GW -> Local AS Number. Note it down or if it’s not set, make sure to set it and avoid conflict with your existing ASN or Azure reserved ASNs.

Config peering in ARS. The example shows you the two peering with Primary and HA Aviatrix Transit Gateways, using Aviatrix Transit Gateway Name, ASN of Aviatrix Transit Gateway, and corresponding bgp_lan interface private IPs noted earlier

Aviatrix Controller -> Multi-Cloud Transit -> Setup -> External connections ->

  • Select External device, BGP, LAN
  • Select Transit vNet
  • Enable Remote Gateway HA AND enable BGP Activemesh (ARS requires full mesh BGP)
  • Remote BGP AS Number and Remote BGP AS Number (Backup) both 65515 (ARS static ASN as noted earlier)
  • Provide ARS peering IP noted earlier in Remote LAN IP and Remote LAN IP (Backup)

Validation

Aviatrix Controller -> Site2Cloud -> Setup, you should observe the S2C connection is up

Aviatrix Controller -> Multi-Cloud Transit -> BGP -> Diagnostics -> select Transit GW and run show ip bgp, we can see on-prem route 192.168.88.88 learned with proper AS Path: 65515 (ARS/ERGW), 12076 (ER private peering MSEE), 65000 (On-prem router)

10.0.10.0/24 is ARS/ERGW vNet and 10.0.16.0/23 is Aviatrix Transit vNet

We can also observe the same from CoPilot -> Troubleshoot -> Cloud Routes -> Site2Cloud. Note how a full mesh tunnels have been established.

CoPilot -> Troubleshoot -> Cloud Routes -> BGP Info -> Learned CIDR

CoPilot -> Troubleshoot -> Cloud Routes -> BGP Info -> Advertised CIDR

On-premise router have the spoke route 10.0.32.0/24 been advertised

show ip bgp
BGP table version is 53, local router ID is 192.168.77.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
t secondary path, L long-lived-stale,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

 Network          Next Hop            Metric LocPrf Weight Path

*> 10.0.10.0/24 169.254.80.82 0 12076 i
*> 10.0.16.0/23 169.254.80.82 0 12076 i
*> 10.0.32.0/24 169.254.80.82 0 12076 i
*> 10.1.30.10/32 0.0.0.0 0 32768 i
*> 10.1.31.0/24 0.0.0.0 0 32768 i
*> 10.1.32.0/24 0.0.0.0 0 32768 i
*> 192.168.88.88/32 0.0.0.0 0 32768 i

Logon to a VM in the spoke vNet and ping the on-premise router loopback

Traceroute towards on-premise router loopback

  • 10.0.32.5 is spoke gateway
  • 10.0.17.196 is transit gateway
  • 10.0.10.4 is ERGW
  • 168.254.80.81 is on-premise router interface that connected to MSEE peering

Packet walk through

When spoke VM 10.0.32.20 tried to reach on-premise router loopback 192.168.88.88. VM itself will use it’s default route sending via eth0

VM route table send to spoke gateway via controller programed RFC1918 routes

Spoke HA gateway has IP 10.0.32.5

In spoke gateway route table, we can see it’s ending to Transit Gateways via IPSec tunnel, also there’s an alternative route via primary spoke GW.

Transit GW route table, shows it will send out via eth1 (we noted earlier this is bgp_lan interface), with next hop IP of 10.0.16.65 and 10.0.16.73.

10.0.16.65 is Azure subnet router of subnet 10.0.16.64/29
10.0.16.73 is Azure subnet router of subnet 10.0.16.72/29

So this traffic be subject to effective route of eth1 (Reminder, as mentioned before, not always eth1, could be eth3 if Transit FireNet enabled)

Effective route of Primary Transit GW, 192.168.88.88/32 next hop type of Virtual Network Gateway point to 10.3.129.70. This is MSEE router that you don’t have control with.

At this point, the only next visibility point would be from the ER Circuit Private Peering route table

Is there another method to connect to Aviatrix Transit if we need:

  • Encryption all the way from vNet to on-premise
  • Enjoy full bandwidth of ExpressRoute without the IPSec limits
  • Provide enterprise grade visibility, monitoring and troubleshooting ability for mission critical workloads.

Stay tunned for next blog

Note

  1. In current Aviatrix implementation of ARS with Aviatrix Transit BGP multi-peer, we used previous BGP over LAN workflow, which were meant to be use to integrate with SD-WAN appliances. In previous workflow, we program bgp_lan subnet route table to have 0/0 point to BGP peer’s IP address. Since this workflow is inherited with ARS integration, it will program 0/0 point to ARS instance. If on-premise network is advertising 0/0 route towards ER, the UDR’s default route will force cloud to internet traffic going through ARS, while ARS will use it’s own route table to redirect the traffic to Express Route Gateway, the ARS isn’t been designed to handle the bandwidth of Express Route. The workaround would be remove the 0/0 UDR from bgp_lan route table, and place a lock on the route table to prevent modification until Aviatrix will release a fix.
  2. If one of the Aviatrix Transit Gateway goes down, as long as remaining Aviatrix Transit Gateway kept BGP peer with the two Azure Route Server Instances, the connectivity will remain. When both ARS instances are UP, they are expecting to receive the same routes. As such as cannot simulate an single ARS instance outage from Aviatrix side, by blocking traffic to a single ARS instance from both Transits.

Leave a Reply

Your email address will not be published. Required fields are marked *