In the last blog post: Express Route to Aviatrix Transit – Option 1, we have discussed how to use BGP over IPSec as overlay from customer on-premise devices to Aviatrix Transit Gateways. This solution have these two constrains:
- Each IPSec tunnel have 1.25G throughput limit
- Azure only support IPSec, not GRE as tunneling protocol
For customer have larger ExpressRoute circuit such as 5Gbps or 10Gbps and above, but doesn’t have encryption requirement or on-premise devices isn’t capable IPSec, option 1 isn’t ideal. In this blog, I will discuss the architecture to connect to Aviatrix Transit and utilize the full ExpressRoute bandwidth.
In following architecture diagram:
- Aviatrix Controller must be 6.8 and above to support Multi-Peer BGPoLAN for Azure Route Server. Azure Route Server require full-mesh peering to avoid single point of failure, which would result in black-hole in traffic flow.
- Aviatrix Transit Gateway must have Insane Mode (High Performance Encryption HPE) enabled, as well as BGP Over LAN enabled.
- Aviatrix Controller allows “Propagate gateway route”, only on the BGP over LAN interface subnet route table.
- The on-premise to ExpressRoute circuit private peering is similar to Express Route to Aviatrix Transit – Option 1
- Instead of deploying ExpressRoute Gateway (ERGW) inside of Aviatrix Transit vNet, we need to create a separate vNet to house ERGW and Azure Route Server (ARS)
- When native vNet peering been used between Spoke to Aviatrix Transit, if ARS is in the same Aviatrix Transit vNet, traffic from spoke to on-premise will bypass Aviatrix Transit gateway, as more specific route from on-premise will be inserted by ERGW point to ERGW, where Aviatrix programs less specific RFC1918 routes point to Aviatrix Transit
- This would apply also to HPE enabled Aviatrix Spoke, as when HPE is enabled, native vNet peering is been used as underlay to build multiple tunnels between Aviatrix Spoke Gateway to Aviatrix Transit Gateways.
- From Aviatrix Transit vNet created a vNet peering with ARS_ERGW_VNet, and enabled use_remote_gateways. This will enable ERGW to propagate learned route to Transit vNet
- From ARS_ERGW_VNet vNet created a vNet peering with Aviatrix Transit vNet, and enabled allow_gateway_transit.
- vNet peering is subject to $0.01 per GB for both inbound and outbound data transfer.
- Multi-hop eBGP is enabled between ARS and Aviatrix Transit Gateway
- ARS requires dedicated RouteServerSubnet subnet, /27 or above, cannot have UDR or Network Security Group (NSG) attached
- ERGW requires dedicated GatewaySubnet subnet, /27 or above, cannot have UDR or Network Security Group (NSG) attached
- Branch to Branch must be enabled on ARS to exchange routes between ARS and ERGW
- ARS Support 8 BGP peers, each peer support up to 1000 routes
- ARS can only exchange up to 200 routes with ERGW
- ARS is a route reflector, and it’s not in traffic path.
- ARS Cost: $0.45USD/hour or $324 USD per month, and for a service that’s not in data path, it’s not cheap
- When you create or delete an Azure Route Server from a virtual network that contains a Virtual Network Gateway (ExpressRoute or VPN), expect downtime until the operation complete. Reference Link
Aviatrix Spoke, Transit and ARS/ERGW deployment
I’m using following Terraform code to deploy Aviatrix Transit with HPE and BGP over LAN enabled, deploy Aviatrix Spoke and attach to transit, create separate vNet for ARS and ERGW, create ARS and ERGW, create vNet peering between Aviatrix Transit vNet and ARS/ERGW vNet, create BGPoverLAN connection between Aviatrix Transit Gateway and ARS
# Deploy Aviatrix Transit vNet and Transit Gateways
module "transit" {
source = "terraform-aviatrix-modules/mc-transit/aviatrix"
version = "2.3.1"
cloud = "Azure"
region = "West US"
cidr = "10.0.16.0/23"
account = "azure-test-jye"
local_as_number = 65001
insane_mode = true
enable_bgp_over_lan = true
bgp_lan_interfaces_count = 1
instance_size = "Standard_D4_v2"
az_support = false
name = "transit"
gw_name = "transit"
resource_group = "ER-LAB"
}
# Deploy Aviatrix Spoke vNet and Spoke Gateways, then attach to Transit
module "mc-spoke" {
source = "terraform-aviatrix-modules/mc-spoke/aviatrix"
version = "1.4.1"
cloud = "Azure"
region = "West US"
cidr = "10.0.32.0/24"
account = "azure-test-jye"
transit_gw = module.transit.transit_gateway.gw_name
name = "spoke"
az_support = false
}
# Deploy ARS and ERGW, and create BGP over LAN connection between Aviatrix Transit GW and ARS
module "azure_route_server" {
source = "terraform-aviatrix-modules/azure-route-server/aviatrix"
version = "1.0.1"
name = "ars"
transit_vnet_obj = module.transit.vpc
transit_gw_obj = module.transit.transit_gateway
cidr = "10.0.10.0/24"
resource_group_name = module.transit.vpc.resource_group
}
ER Circuit, connection and on-premise router configuration
This part is very similar to Express Route to Aviatrix Transit – Option 1
- Create ExpressRoute circuit and have it provisioned with provider
- Create private BGP peering between ER circuit Microsoft Enterprise Edge route and on-premise device
On-premise router configuration peering with MSEE
- GigabitEthernet0/0/0.803 169.254.80.81 is connected to on customer side primary link subnet 169.254.80.80/30
- Loopback is created for testing connectivity from cloud, we only advertise the loopback towards ER
- BGP session create towards MSEE 169.254.80.82, note ER always use ASN 12076
interface GigabitEthernet0/0/0.803
description to be connected to an Azure ER circuit
encapsulation dot1Q 803
ip address 169.254.80.81 255.255.255.252
interface Loopback88
ip address 192.168.88.88 255.255.255.255
router bgp 65000
bgp log-neighbor-changes
neighbor 169.254.80.82 remote-as 12076
neighbor 169.254.80.82 description Express Route
!
address-family ipv4
network 192.168.88.88 mask 255.255.255.255
neighbor 169.254.80.82 activate
neighbor 169.254.80.82 soft-reconfiguration inbound
neighbor 169.254.80.82 prefix-list router-to-er out
maximum-paths 8
exit-address-family
ip prefix-list router-to-er description Advertise Loopback only
ip prefix-list router-to-er seq 10 permit 192.168.88.88/32
- Create ER Circuit connection between the ER circuit and ExpressRoute Gateway (The ERGW was provisioned by above TF script)
If you want to create the Aviatrix Gateway BGP over LAN to ARS manually:
Note down ARS ASN and peering IP
Note down Aviatrix Transit GW BPG over LAN interface IP, note this interface may not be eth1, as if you enabled Transit FireNet which will create two additional interfaces, this could push the BGP over LAN interface to eth3
Azure Portal -> Primary Transit Gateway -> Networking -> find bgp_lan interface, and note down it’s private IP, eg: 10.0.16.68
Azure Portal -> HA Transit Gateway -> Networking -> find bgp_lan interface, and note down it’s private IP, eg: 10.0.16.76
Obtain Aviatrix Transit Gateway ASN, Multi-Cloud Transit -> Advanced -> Select Transit GW -> Local AS Number. Note it down or if it’s not set, make sure to set it and avoid conflict with your existing ASN or Azure reserved ASNs.
Config peering in ARS. The example shows you the two peering with Primary and HA Aviatrix Transit Gateways, using Aviatrix Transit Gateway Name, ASN of Aviatrix Transit Gateway, and corresponding bgp_lan interface private IPs noted earlier
Aviatrix Controller -> Multi-Cloud Transit -> Setup -> External connections ->
- Select External device, BGP, LAN
- Select Transit vNet
- Enable Remote Gateway HA AND enable BGP Activemesh (ARS requires full mesh BGP)
- Remote BGP AS Number and Remote BGP AS Number (Backup) both 65515 (ARS static ASN as noted earlier)
- Provide ARS peering IP noted earlier in Remote LAN IP and Remote LAN IP (Backup)
Validation
Aviatrix Controller -> Site2Cloud -> Setup, you should observe the S2C connection is up
Aviatrix Controller -> Multi-Cloud Transit -> BGP -> Diagnostics -> select Transit GW and run show ip bgp, we can see on-prem route 192.168.88.88 learned with proper AS Path: 65515 (ARS/ERGW), 12076 (ER private peering MSEE), 65000 (On-prem router)
10.0.10.0/24 is ARS/ERGW vNet and 10.0.16.0/23 is Aviatrix Transit vNet
We can also observe the same from CoPilot -> Troubleshoot -> Cloud Routes -> Site2Cloud. Note how a full mesh tunnels have been established.
CoPilot -> Troubleshoot -> Cloud Routes -> BGP Info -> Learned CIDR
CoPilot -> Troubleshoot -> Cloud Routes -> BGP Info -> Advertised CIDR
On-premise router have the spoke route 10.0.32.0/24 been advertised
show ip bgp
BGP table version is 53, local router ID is 192.168.77.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
t secondary path, L long-lived-stale,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*> 10.0.10.0/24 169.254.80.82 0 12076 i
*> 10.0.16.0/23 169.254.80.82 0 12076 i
*> 10.0.32.0/24 169.254.80.82 0 12076 i
*> 10.1.30.10/32 0.0.0.0 0 32768 i
*> 10.1.31.0/24 0.0.0.0 0 32768 i
*> 10.1.32.0/24 0.0.0.0 0 32768 i
*> 192.168.88.88/32 0.0.0.0 0 32768 i
Logon to a VM in the spoke vNet and ping the on-premise router loopback
Traceroute towards on-premise router loopback
- 10.0.32.5 is spoke gateway
- 10.0.17.196 is transit gateway
- 10.0.10.4 is ERGW
- 168.254.80.81 is on-premise router interface that connected to MSEE peering
Packet walk through
When spoke VM 10.0.32.20 tried to reach on-premise router loopback 192.168.88.88. VM itself will use it’s default route sending via eth0
VM route table send to spoke gateway via controller programed RFC1918 routes
Spoke HA gateway has IP 10.0.32.5
In spoke gateway route table, we can see it’s ending to Transit Gateways via IPSec tunnel, also there’s an alternative route via primary spoke GW.
Transit GW route table, shows it will send out via eth1 (we noted earlier this is bgp_lan interface), with next hop IP of 10.0.16.65 and 10.0.16.73.
10.0.16.65 is Azure subnet router of subnet 10.0.16.64/29
10.0.16.73 is Azure subnet router of subnet 10.0.16.72/29
So this traffic be subject to effective route of eth1 (Reminder, as mentioned before, not always eth1, could be eth3 if Transit FireNet enabled)
Effective route of Primary Transit GW, 192.168.88.88/32 next hop type of Virtual Network Gateway point to 10.3.129.70. This is MSEE router that you don’t have control with.
At this point, the only next visibility point would be from the ER Circuit Private Peering route table
Is there another method to connect to Aviatrix Transit if we need:
- Encryption all the way from vNet to on-premise
- Enjoy full bandwidth of ExpressRoute without the IPSec limits
- Provide enterprise grade visibility, monitoring and troubleshooting ability for mission critical workloads.
Stay tunned for next blog
Note
- In current Aviatrix implementation of ARS with Aviatrix Transit BGP multi-peer, we used previous BGP over LAN workflow, which were meant to be use to integrate with SD-WAN appliances. In previous workflow, we program bgp_lan subnet route table to have 0/0 point to BGP peer’s IP address. Since this workflow is inherited with ARS integration, it will program 0/0 point to ARS instance. If on-premise network is advertising 0/0 route towards ER, the UDR’s default route will force cloud to internet traffic going through ARS, while ARS will use it’s own route table to redirect the traffic to Express Route Gateway, the ARS isn’t been designed to handle the bandwidth of Express Route. The workaround would be remove the 0/0 UDR from bgp_lan route table, and place a lock on the route table to prevent modification until Aviatrix will release a fix.
- If one of the Aviatrix Transit Gateway goes down, as long as remaining Aviatrix Transit Gateway kept BGP peer with the two Azure Route Server Instances, the connectivity will remain. When both ARS instances are UP, they are expecting to receive the same routes. As such as cannot simulate an single ARS instance outage from Aviatrix side, by blocking traffic to a single ARS instance from both Transits.