Aviatrix control plane HA in AWS

Aviatrix Controller isn’t In data path, controller down will affect ability to change currently configuration, or to monitor gateway status to make changes to route tables, or to authenticate new VPN user connection request.

To make sure Aviatrix controller in AWS highly available by avoiding single AZ failure, Aviatrix has developed a CloudFormation template that utilizes Auto Scaling Group and Lambda function to automatically monitor controller failure, redeploy controller and restore configuration.

In case you want to have regional HA for Aviatrix Controller, the recommendation would be:

  • Make sure the S3 bucket for Controller backup replicated to another region.
  • Pre-create an Aviatrix Controller and allocate EIP, pre-upgrade to specific version, keep controller shutdown
  • Whitelist the EIP in your firewall policy
  • In event of failure, turn on the new Aviatrix Controller and restore from backup.

To enable existing Aviatrix Controller in AWS:

  1. Controller VPC preferably should contain two public subnets in different AZs to avoid single AZ failure.
  2. Public subnet in AWS means it has a route table with 0/0 point to IGW.
  3. Enable controller backup to an S3 bucket
  4. (optional) Enable ALB/NLB and target auto scaling group.
    1. Some customer want to have access the controller via private IP from internal network, load balancer will help to ease the DNS update issue
    2. ALB can also be used to offload SSL and facilitate WAF feature

To enable HA for existing Aviatrix controller in AWS, launch this CloudFormation template

Parameter provided here will be set as environment variables for the Lambda function

  • Pick VPC contains Aviatrix controller
  • Select one or more subnets with existing Controller and additional subnet in different AZ
  • Provide name tag of the Aviatrix Controller
  • Provide S3 bucket that contains the controller backup
  • Provide email for notification from Auto Scaling Group events
  • By default, lambda access controller via public IP. You may tell lambda to access controller via private IP, but you have to manually attach lambda to VPC subnets, and make sure lambda have either EIP or NAT to access internet.

A Lambda function will be created, by appending “-ha” behind your controller’s name, for restore controller configuration from the S3 bucket:

A role will be created, by appending “-role-lambda” behind your controller’s name, for providing appropriate permission for the lambda function:

An auto scaling group will be created, by using the same name as the controller, with size set to 1 (minimum capacity=0, maximum capacity=1, desired capacity=1)

Previous Controller Security Groups will be used:

If you have provided multiple subnets when using the CloudFormation, these subnets will be used by the Auto Scaling Group

Existing Controller instance will be added to the Auto Scaling Group.

A SNS topic will be created, by using the same name as the controller

Two subscriptions to the SNS will be created

  • One for email notification (the reception must click in the email to confirm subscription to receive further emails)
  • One for triggering the Lambda function created earlier

In the event of a controller health issue detected by Auto Scaling Group:

  • Auto Scaling Group
    • Remove unhealth instance
    • Create new instance, launch in another AZ if available
    • Send notification to SNS
    • SNS triggers
      • Email notification
      • Lambda
        • Lambda contacts new controller and reassign EIP
        • New controller restore backup from S3 bucket

Following will show up in Auto Scaling Group Activity section

Sample Email Notification

Sample CloudWatch log:

Note

There will be several minutes before the new Controller is fully ready, as:

  • New Controller gets created from image
  • New Controller gets upgraded to proper version
  • New Controller restore configuration from S3 bucket

During this period of time, you may get following screen:

Unable to login using your current password (if you try to login as local IP it will allow you, but please don’t do that)

Please use CloudWatch log, wait until you see this, before try to login

Successfully restored backup.

Leave a Reply

Your email address will not be published. Required fields are marked *