As the number of customers onboarding to GCP Google Cloud Platform continues to grow, one of the most common questions asked is how to access GCP Global Services, such as Cloud SQL, privately and securely. The unique features of GCP networking, including the global VPC construct, single route table for all subnets, and regional Cloud Routers, can be challenging for enterprise customers seeking to access GCP global services. In this blog post, I will demonstrate how Aviatrix architecture enables customers securely and efficiently access GCP global services.
GCP networking characteristics
- Majority of Cloud Service Providers (CSPs) virtual networks in cloud are regional construct. When you create virtual network in most CSPs, you would assign one more more IP ranges to it. GCP’s virtual network or Virtual Private Cloud (VPC) is a global construct, and during it’s creation, GCP VPC doesn’t have any IP range/region assigned.
- When you create subnet in the empty VPC created earlier, you need to designate a region for the subnet, as well as one or more IP ranges.
- Unlike other CSPs, where you can associate route table with individual subnets. GCP only have a single route table for all subnets in the same VPC.
- All subnets in the same VPC automatically register their prefixes in this single route table, with a metric of 0. This allows resources in the same VPC to talk to each other regardless which region they are located, however since the metric is 0, you cannot override these system routes to redirect the traffic between subnets to another network device such as router/firewall.
- GCP global services such as Cloud SQL, are hosted outside of customer projects in Google managed VPC. To access GCP global services, you need to create a customer managed VPC in your project. In private service connection of customer managed global VPC, you need to allocate non-conflicting IP ranges that will be consumed by GCP global service and connect these IP ranges to servicenetworking.googleapis.com. In the backend a VPC peering gets created between GCP managed VPC and customer managed VPC.
Additional reading for Configure private services access
- GCP compute instances can have multiple NICs, but each NIC must belong to different VPC. For routers/firewalls/Aviatrix Transit Gateways that require multiple NICs, each NIC need to land on different VPCs.
- GCP compute instances can have maximum of 8 NICs, however the number of NICs cannot be changed on the fly. In order to change the number of the NICs, the GCP instance must be deleted and recreated. For routers/firewalls/Aviatrix Transit Gateways that require multiple NICs, the design must be future proof to avoid redeployment.
- To allow VPCs to dynamically exchange route with BGP capable devices, GCP offers Cloud Routers (CR). CR is a regional construct, during it’s creation, you need to specify which VPC/Region/ASN it’s been assigned to. It can either auto advertise all prefixes associated with the VPC it attached to or perform custom router advertisement (with the option to advertise VPC prefixes).
- In the VPC the CR is attached it, you can choose either Regional or Global Dynamic routing mode. When Regional is set, CR will only learn routes in the region where it was created. When Global is set, CR will learn all routes from all regions, including those from VPN/Interconnect (GCP’s high speed private connection from on-prem, equivalent to AWS Direct Connect or Azure Express Route).
How GCP global service works natively with private access?
Shown in below diagram, it’s quite simple. Customer would create compute engine in their own managed VPC, this managed instance can be in any region. Then customer will add IP ranges to be consumed by Global Services, and enable these ranges via servicenetworking.googleapis.com. Then you would create a global service such as Cloud SQL, select Private IP and deselect Public IP, select region for the Global Service, select customer managed VPC (it can only be associated with one customer managed VPC at any given time), and select allocated IP range. An VPC peering will be created automatically.
The compute engine could be in any region, and SQL instances can be in any region too. Remember each VPC only have a single route table? Once the traffic leaves the compute engine, it will find SQL instance private IP match allocated IP address range that’s pointing to peering connection, once the traffic reaches GCP managed VPC, it’s route table knows which region to forward traffic to.
Quite elegant. But it would be challenging to insert a firewall into the traffic and make sure the traffic flow through the firewall in the correct region. What if the accesses are coming from different VPC?
The Aviatrix Architecture
Below design is credit to my brilliant colleague: Matthew Kazmar
- Start from top. Aviatrix suggest workload VPC subnets created in a single region. Aviatrix gateways shows as red are deployed in the spoke VPC. Aviatrix manages the route table and insert RFC 1918 routes to attract traffic towards Aviatrix Spoke Gateway
- The Spoke Gateways are attached Transit Gateways below. Spoke Gateway utilizes eth0 to build Active Mesh or High Performance Encryption tunnels towards Transit Gateways eth0. Transit Gateways eth0 is attached to Transit Gateway VPC with subnet in a single region as well.
- Below Transit Gateway, we have this psc-global-vpc. PSC stands for Private Service Connection. This VPC is managed by customer, which have multiple subnets in different regions. These regions align with the Cloud SQL instance regions. Transit Gateway’s eth1 is attached to subnets in the same region as transit gateway and SQL instance of the region
- In the psc-global-vpc subnet, multiple IP ranges are allocated for Global Services such as Cloud SQL. Eg: 10.192.0.0/20 and 10.192.16.0/20 in this diagram.
- In the psc-global-vpc subnets, regional Cloud Routers are deployed. Full mesh BGP tunnels are built between Cloud Routers towards Transit Gateways eth1 interfaces. Each cloud router would have a custom advertised route that map to the IP range assigned to the Cloud SQL in the same region. Eg us-central1 CR would advertise 10.192.0.0/20. us-west2 CR would advertise 10.192.16.0/20. These advertisement will be sent to Aviatrix transit and made available for Aviatrix spoke or other directly peered Aviatrix Transit or other Aviatrix Transit External connections.
- When enabling the IP ranges towards private services connection, there’s an option to export custom route. This option allows routes received by Cloud Router sent towards GCP managed VPC for Global Services such as Cloud SQL.
- After Cloud SQL is deployed, then a VPC peering is established between customer managed VPC to GCP managed VPC.
- If you are interested to test this architecture, you can deploy it using this terraform code
High level architecture diagram
On the high level, this is what it would look like.
- Spoke in us-central1 will only be able to talk to primary SQL instance in us-central1
- Spoke in us-west2 will only be able to talk to read replica SQL instance in us-west2
- To allow both spokes to talk to both SQL instances
- either peer the two Aviatrix Transit, cross region traffic will across transit peering
- or advertise both IP ranges in both Cloud Routers, cross region traffic will happen in GCP managed VPC
Now, what if we have multiple global services, and how do we insert Firewall? Remember the constrains:
- Each global service can only connect to a single VPC
- Instance with multiple NICs must be attached to different VPCs
- Instance can have max 8 NICs
- Instance has to be rebuild to add/remove NICs
As seen on prior diagram, Aviatrix Transit utilizes extra NIC eth1 to perform BGP peering with the psc VPC. Different global services may require different psc VPC, which would means additional NIC for the Aviatrix Transit. Aviatrix FireNet workflow will help to build Aviatrix Transit with additional network interfaces that will communicate with the Firewall. So if we combine Aviatrix Transit FireNet with the BGP peering capability, the solution will become very rigged and won’t scale when needed.
Here’s the recommended architecture that will bring in the flexibility.
- We would treat the BGP enabled Transit as a spoke
- Additional Aviatrix Transit FireNet with firewalls will be created aligned with region
- These Aviatrix Transit FireNet will have Multi-tier Transit (MTT) enabled to allow transitivity from the BGP enabled Transit acting as spoke
- Transit peering can be established between MTT enabled Transit FireNet to allow cross region connectivity.
- The MTT Transit FireNets will be the hub of the region for other spokes/ Site to Cloud VPN connections/ Interconnect/ Aviatrix Edge gateways.
- Also we can utilize much smaller instance size for the BGP transit gateways, this would be especially helpful for QA/Dev environment.
Additional thoughts
- As of April 2023, Aviatrix Spoke doesn’t support BGP over LAN, also doesn’t support multiple spoke deploy into the same VPC to different subnets in different regions.
- Aviatrix is working on enabling Aviatrix spoke deployment in Global VPC. This will be accomplished by using instance tag to send traffic of specific suffix to specified Spoke gateway. However, GCP VPC peering will not export suffixes to peering VPC with instance tag, so this will not help with a solution for Global Services.