It is common for enterprise customers to run a workload in AWS in a public-facing subnet, where the default route (0.0.0.0/0) would be pointing to the AWS internet gateway (IGW). Reference: AWS Internet Gateway Documentation
The IGW provides NAT between the public IP and the private IP assigned to the instance. You may control inbound/outbound traffic via Security Group, where you can control what protocol and IP range that would have access. However, IGW won’t provide you much visibility of the traffic going in/out from your instance, and you may need to use FlowLogs to gain some level of visibility. Some examples of FlowLogs can be found here: Flow log record examples. You may find it lack of detail and very difficult to read.
For enterprise customers that value visibility and security, as well as simplified IT operations, Aviatrix has designed a Public Subnet Filtering gateway feature for AWS public subnet workload.
Step by step of Aviatrix Public Filtering Gateway deployment
Existing VPC setup
Existing VPC ue1spoke1 has CIDR range of 10.32.1.0/24. It has two public subnets and two private subnets:
In the two public subnets, we have two instances:
pub-1a with private IP of 10.32.1.36, public IP of 188.8.131.52
pub-1b with private IP of 10.32.1.52, public IP of 184.108.40.206
Traffic between pub-1b and the internet flows through eth0, then translated by IGW to its public IP, and then sent over to the internet.
Deploy Aviatrix Public Subnet Filtering Gateway
Aviatrix Controller -> Security -> Public Subnet
Each PSF gateway will look for the next unused /26 address space from the existing VPC CIDR (You may need to add /25 range to your existing VPC CIDR, if there isn’t any free address space. /25 is for the primary PSF gateway and HA PSF gateway)
Also, you need to pick which public route table to be managed by the PSF gateway
Aviatrix Controller uses the free /26 address space to create aviatrix-<psf-gw-name> subnet, which have it’s own route table: aviatrix-Aviatrix-Filter-Gateway, where 0.0.0.0/0 point to IGW
Also since we are telling Aviatrix PSF to manage public subnet ue1spoke1-Public-1-us-east-1a 10.32.1.32/28, a special route table: aviatrix-Aviatrix-Ingress-routing is created and associated with IGW as Edge route table. Anything sent to the destination of ue1spoke1-Public-1-us-east-1a 10.32.1.32/28 will be forwarded to PSF server ENI:
Aviatrix also started to manage the route table for public subnet ue1spoke1-Public-1-us-east-1a 10.32.1.32/28, where 0.0.0.0/0 has been updated to point to PSF server ENI:
Current traffic flow. Anything marked RED is the change.
Traffic comes from the internet to pub-1a gets to the Edge route first, since its destination to ue1spoke1-Public-1-us-east-1a 10.32.1.32/28, it will be forwarded to aviatrix-psf1 gateway, then the traffic will be forwarded to target pub-1a via local route. pub-ia replies to the traffic which is subject to its subnet route table, where 0.0.0.0/0 points back to aviatrix-psf1 gateway, aviatrix-psf1 then forwards the traffic out via IGW.
Visibility and control gained by Aviatrix PSF gateway in CoPilot
Since the PSF gateway is transparently inline with the traffic, let’s check what type of visibility and control we are gaining so far:
In CoPilot -> Monitor -> FlowIQ
We can see a nice diagram showing Source IP, Destination IP, Source port, Destination Port, IP Version and Protocol, Autonomous Systems, Flow Exporters, and Flow Locality.
In Source IP, if we click on pub-1a’s IP: 10.32.1.36, a query string will be built automatically as a filter.
Now that it’s narrowed down to traffic from pub-1a, all other fields are also been updated, such as total traffic, destination IP, and within Autonomous Systems, you can quickly see it’s trying to access Google, Digital Ocean, and NTT, etc.
In Geolocation, we can see traffic going to US and UK
In Records, we can see a further break down of the traffic flow.
Within the records tab, you can add/remove columns, by clicking on the grid icon, it will list tons of additional columns that can really enrich your investigation.
The Flows tab can help easily identify which flow took up most of your bandwidth. Notice in each tab, the filters are carried over.
Another place to gain additional visibility is via CoPilot -> Cloud Fabric -> Gateways -> Specialty Gateways -> Click on the PSF gateway
Feel free to explore other sections, but if you click on that …, then click on Gateway Diagnostics
Here you have been provided tons of additional tools, my personal fav is packet capture, where later you can download as a PCAP file for deep analysis.
What about High Availability?
In the initial diagram, we have two public subnets and each has an instance running. Although we can use a single PSF gateway to manage more than just one public subnet route table, but that PSF gateway is a single point of failure.
We can launch a HA PSF gateway to provide additional availability:
In Controller -> Security -> Public Subnet -> Select the origin PSF gateway -> Actions -> Enable HA
The workflow is simple; since we already know which VPC we will need to be working on. So in this case, we selected the next unused /26 address space and different 1b AZ, as well as the 1b AZ public subnet route table to be managed.
Additional subnet for PSF HA gateway has been created using the unused /26 address range, it uses the same route table of aviatrix-Aviatrix-Filter-Gateway
The special route table associated with IGW as Edge route table: aviatrix-Aviatrix-Ingress-routing has been updated. Anything sent to the destination of ue1spoke1-Public-1-us-east-1a 10.32.1.32/28 will be forwarded to aviatrix-psf1 server ENI, Anything sent to the destination of ue1spoke1-Public-2-us-east-1b 10.32.1.48/28 will be forwarded to aviatrix-psf1-hagw server ENI.
Since we are managing the public subnet ue1spoke1-Public-2-us-east-1b 10.32.1.48/28 route table, its route table has been updated, so 0.0.0.0/0 changed from IGW to aviatrix-psf1-hagw server ENI
Traffic flow for both public subnets when both PSF gateways are up
What if one of the PSF gateways fails?
Test the primary PSF gateway is having trouble and Aviatrix marked the gateway as down:
Aviatrix Controller modified the special route table associated with IGW as Edge route table: aviatrix-Aviatrix-Ingress-routing, so destination to both public subnets are now pointing to the remaining healthy PSF HA gateway ENI:
Both PSF managed public subnet route table are also been updated to point 0.0.0.0/0 to remaining healthy PSF HA gateway ENI:
Traffic flow when the primary PSF gateway is down:
Additionally, Aviatrix Gateway has this feature: Single AZ HA. This feature is enabled by default. When the Controller observes PSF gateway went down, it will try to turn it back up.
Additional Security with ThreadIQ
Since now CoPilot now tracks all flows via the PSF gateway via managing selected public subnet route tables, we can use the flow information against the reputational database to identify and even block traffic to/from known bad IP addresses. The following shows threats detected.
We can configure ThreatIQ to send out alerts and/or block traffic
Traffic that is been dropped by ThreatIQ
Want additional security?
It is great that now we have greater visibility on layer 4 with protocol, ports, IPs, etc. While CoPilot can translate some of those IPs via autonomous systems, it lacks accuracy. Combining PSF with Aviatrix FQDN gateway, we can gain even granular visibility and control.
This enables the FQDN feature on the PSF gateway and puts them in discovery mode.
Then I rebooted pub-1a and pub-1b instances, logon to them, and run: sudo apt update
Under actions, click Show, it lists all the sites and ports each PSF gateway observed.
You can click on the Download button to download the discovered FQDN and ports into a file, which can be used in later steps.
Based on the finding, we can create either a whitelist (Allow list, allowing only specific FQDN and ports) or a blacklist (Deny list, deny specific FQDN and ports byt allowing everything else)
Now let’s Edit the whitelist tag we created:
We can either manually put in entries, or we can use import to use the discovery files saved earlier
Imported FQDN and ports for the whitelist
Click on Update to make sure the entries are saved
Now let’s make sure to attach the PSF gateway to this tag:
Last, make sure to enable the tag:
Logon to pub-1a for testing:
Connect to http://google.com time out, as it’s not in the whitelist.
Connect to https://esm.ubuntu.com/ works as expected
Additional observability in CoPilot -> Security -> Egress -> Overview
In Monitor page:
Additional note, when the FQDN feature is enabled on the PSF gateway, since the Aviatrix controller only updates PSF managed public subnet route table to have 0.0.0.0/0 to point to PSF gateway ENI. Other unmanaged public subnet or private subnet route tables are not updated by Aviatrix Controller, hence these subnets continue to use their existing egress method.