Aviatrix control plane HA in AWS

Aviatrix Controller isn’t In data path, controller down will affect ability to change currently configuration, or to monitor gateway status to make changes to route tables, or to authenticate new VPN user connection request.

To make sure Aviatrix controller in AWS highly available by avoiding single AZ failure, Aviatrix has developed a CloudFormation template that utilizes Auto Scaling Group and Lambda function to automatically monitor controller failure, redeploy controller and restore configuration.

In case you want to have regional HA for Aviatrix Controller, the recommendation would be:

Make sure the S3 bucket for Controller backup replicated to another region.
Pre-create an Aviatrix Controller and allocate EIP, pre-upgrade to specific version, keep controller shutdown
Whitelist the EIP in your firewall policy
In event of failure, turn on the new Aviatrix Controller and restore from backup.

To enable existing Aviatrix Controller in AWS:

Controller VPC preferably should contain two public subnets in different AZs to avoid single AZ failure.
Public subnet in AWS means it has a route table with 0/0 point to IGW.
Enable controller backup to an S3 bucket
(optional) Enable ALB/NLB and target auto scaling group.
1. Some customer want to have access the controller via private IP from internal network, load balancer will help to ease the DNS update issue
2. ALB can also be used to offload SSL and facilitate WAF feature

To enable HA for existing Aviatrix controller in AWS, launch this CloudFormation template

Parameter provided here will be set as environment variables for the Lambda function

Pick VPC contains Aviatrix controller
Select one or more subnets with existing Controller and additional subnet in different AZ

Provide name tag of the Aviatrix Controller
Provide S3 bucket that contains the controller backup
Provide email for notification from Auto Scaling Group events
By default, lambda access controller via public IP. You may tell lambda to access controller via private IP, but you have to manually attach lambda to VPC subnets, and make sure lambda have either EIP or NAT to access internet.

A Lambda function will be created, by appending “-ha” behind your controller’s name, for restore controller configuration from the S3 bucket:

A role will be created, by appending “-role-lambda” behind your controller’s name, for providing appropriate permission for the lambda function:

An auto scaling group will be created, by using the same name as the controller, with size set to 1 (minimum capacity=0, maximum capacity=1, desired capacity=1)

Previous Controller Security Groups will be used:

If you have provided multiple subnets when using the CloudFormation, these subnets will be used by the Auto Scaling Group

Existing Controller instance will be added to the Auto Scaling Group.

A SNS topic will be created, by using the same name as the controller

Two subscriptions to the SNS will be created

One for email notification (the reception must click in the email to confirm subscription to receive further emails)
One for triggering the Lambda function created earlier

In the event of a controller health issue detected by Auto Scaling Group:

Auto Scaling Group
- Remove unhealth instance
- Create new instance, launch in another AZ if available
- Send notification to SNS
- SNS triggers
  - Email notification
  - Lambda
    - Lambda contacts new controller and reassign EIP
    - New controller restore backup from S3 bucket