Aviatrix Source NAT (SNAT) on BGP Spoke

Aviatrix customer has applications running in various VPCs, these applications need to be able to communicate with FiServ securely via BGP over IPSec tunnels. FiServ also has lots of its own customers that most likely use RFC1918 as internal address space. To avoid conflicts, FiServ assigns non-RFC1918 IP Prefixes to FiServ customers, the incoming connection must be Source NAT (SNAT) to these non-RFC1918 IP Prefixes, and FiServ also only allows incoming traffic from these non-RFC1918 IP Prefixes on FiServ firewalls.

FiServ reference architecture: https://developer.fiserv.com/product/FirstVisionEMEA/docs/?path=docs/Support/Client-Onboarding.md&branch=main#connectivity-overview-diagram

This blog intends to walk through various considerations that affects final design.

We can start off with traditional Aviatrix Spoke and Hub architecture on the left side.

Application VPC : uw1spoke2 10.64.100.0/24
Application VPC : uw1spoke1 10.16.100.0/24
BGP Spoke VPC : uw1landing 10.16.101.0/24
Transit VPC : uw1transit 10.16.0.0/23

Right side CSR VPC is to simulate FiServ.

First design consideration

The first design consideration that comes up: Where do we establish the BGP over IPSec tunnels? From a BGP Spoke uw1landing as illustrated in above diagram as solid blue lines, or from uw1transit as illustrated as dotted blue lines?

  1. Regular IPSec tunnel will pin to a single CPU core, which could limit to theoretical 1.25Gbps of throughput.
  2. Aviatrix have proprietary solution called High Performance Encryption (AKA HPE or Insane mode), that can automatically create multiple IPSec tunnels between Aviatrix Gateways, which enables multiple flows to be distributed amongst these IPSec tunnels. HPE requires usage of multiple IPs (secondary private IPs if going through private underlay, or Public IPs when going through Internet) when building these HPE IPSec tunnels between Aviatrix Gateways, multiple CPU cores will be utilized. But these HPE connection can only exist in between Aviatrix Spoke to Aviatrix Transit attachment, or between Aviatrix Transit peering.
  3. In case when Aviatrix Gateways are building external connections with third party vendors, we cannot use HPE as 3rd party won’t support HPE, hence we can only use gateway’s eth0 to build IPSec tunnel. In each Aviatrix external connection, we always attempt to create two tunnels, the first tunnel from Aviatrix Primary Gateway, the second tunnel from Aviatrix HA Gateway. Dive into my blog: Aviatrix Site to Cloud Connection demystified to learn more.
  4. Aviatrix have observed customers using more then 10 external connections resulting a performance impact on the gateways, as these external connections are pined on single core. So it’s advised to not exceed 10 external connections if the connection is landing from Transit. Or simply move these external connections to a dedicated landing spoke.

Data path

Let’s assume that we will have a dedicated landing spoke, then the traffic flow would be indicated as light blue line:

  1. from instance in uw1spoke1 to uw1spoke1 GW eth0
  2. uw1spoke1 GW makes routing decision and forward to tunnels between uw1spoke1 to uw1transit
  3. uw1transit GW makes routing decision and forward to tunnels between uw1transit to uw1landing
  4. uw1landing GW makes routing decision and forward to tunnels between uw1landing to CSR1

Second decision point

The next decision is: Where do we apply the SNAT rules? On uw1spoke1, or uw1transit or on uwlanding?

To answer this question, we have to know some of the basics of NAT:

  1. NAT is stateful, also Aviatrix primary and HA GW doesn’t synchronize NAT rules and session state.
    • This means when you perform NAT, you need to make sure the return traffic would land on the same gateway with the initial traffic, or traffic would get dropped.
  2. You need to specify NAT on both primary and HA gateway.
  3. Aviatrix allows you to perform NAT on eth0 (interface), or non-HPE tunnels between Aviatrix Spoke to Aviatrix Transit (connection), or on external connections (connection).
    • In Controller, you can select interface and connection , when you select both interface and connection and choose save, connection overrides interface.
    • In CoPilot, you cannot select eth0 directly. When you select none in Connection, it will select eth0 instead.
    • When HPE is been used in between Aviatrix Spoke to Aviatrix Transit, apply NAT rules on the connection between Aviatrix Spoke to Aviatrix Transit, traffic would fail intermittently, as HPE creates multiple tunnels, compound with number of NAT rules will quickly exhaust iptable. Hence Aviatrx NAT rule is currently only applied to the first tunnel, thus NAT cannot be applied to HPE connections.
  4. Be aware the sequence of DNAT, Routing, and SNAT, this would help you to pick the correct interface/tunnel for the NAT rules to be applied to. Credit to Barry Li for this simple and powerful diagram illustrating the sequence:

Now back to our original question, where do we apply the SNAT rules? Since the SNAT rules only applies after the routing decision. In following data path, the SNAT rules can only apply on:

  1. uw1spoke1 gateways, select the connection from uw1spoke1 to uw1transit attachment. In this case uw1spoke1 to uw1transit attachment cannot be HPE.
  2. or on uw1transit gateways, select the connection from uw1transit to uw1landing attachment. In this case, uw1landing to uw1transit attachment cannot be HPE.
  3. or on uw1landing gateways, select the external connection from uw1landing to CSR1.

The final consideration

Let’s say that we decided to apply the SNAT rules on both uw1landing gateways, how do we make sure the return traffic would always land on the same gateway that SNAT was performed?

In below diagram, we can see how the SNAT entries are programmed:

On primary GW uw1landing

SrcCIDRDestCIDRProtocolConnectionSNAT IPs
10.16.100.0/2410.32.100.0/24AllToCSR1@site2cloud33.33.33.33
10.64.100.0/2410.32.100.0/24AllToCSR1@site2cloud4.4.4.3

On HA GW uw1landing-hagw

SrcCIDRDestCIDRProtocolConnectionSNAT IPs
10.16.100.0/2410.32.100.0/24AllToCSR1@site2cloud33.33.33.34
10.64.100.0/2410.32.100.0/24AllToCSR1@site2cloud4.4.4.4

Note, on each gateway the SNAT IP is unique.

On the external connection to CSR, we will need to advertise all the SNAT IPs towards CSR1, as shown in the middle blue text box.

For return traffic, Aviatrix program the SNAT IP in such way, so it will always be forward back to the same gateway where the SNAT was performed.

For example: On uw1landing primary gateway, we have SNAT to 33.33.33.33 for traffic initiated from 10.16.100.0/24. On uw1landing-hagw Aviatrix program 33.33.33.33/32 point to uw1landing primary gateway, so if the return traffic would ever land on hagw, it will be forwarded to the primary gateway where SNAT was performed.

Another example: On uw1landing-hagw, we have SNAT to 4.4.4.4 for traffic initiated from 10.64.100.0/24, for return traffic that lands on uw1landing primary gateway, it would be forwarded to uw1landing-hagw via the route table.

Leave a Reply

Your email address will not be published. Required fields are marked *