Aviatrix CoPilot Baseline Metric

Enterprise customers values Aviatrix CoPilot for track and gather evidential data on their network. The platform aggregates abundant Syslog and Netflow data, which can be used to establish baseline metrics for alerting. Customers can choose to modify or add/remove metrics to suit their specific needs. Here is a list of recommended baseline metric, as well as detail of each one’s meaning.

Recommended baseline metric

Credit to my friend: Ricardo Trentin

System Metric (For Controller/CoPilot/Gateways)
Memory Available (memory_available_per) (<= 20%)
CPU Used % (cpu_used_per) (>= 90%)
Percent Disk Free (hdisk_free_per) (<= 20%)
Network Metric (For Gateways)
PPS Limit Exceeded Rate (rate_pps_limit_exceeded) (>= 75)
Bandwidth Egress Limit Exceeded Rate (rate_bandwidth_egress_limit_exceeded) (>= 40)
Bandwidth Ingress Limit Exceeded Rate (rate_bandwidth_ingress_limit_exceeded) (>= 40)
Errored Packets Transmitted Rate (rate_tx_errs) (>= 40)
Errored Packets Received Rate (rate_rx_errs) (>= 40)
Rate of Packets Dropped While Receiving (rate_rx_drop) (>= 40)
Rate of Packets Dropped While Transmitting (rate_tx_drop) (>= 40)
Conntrack Limit Exceeded Rate (rate_conntrack_limit_exceeded) (>= 40)
Tunnel Count
Health Metric (For Gateways)
Gateway Status
BGP Peering Status
Connection Status
Underlay Connection Status

Health Metric name and explanation

NAMEDESCRIPTION
BGP Peering StatusReporting BGP peering status changes
Connection StatusReporting gateway tunnel connection status
Gateway StatusReporting gateway health status
Underlay Connection StatusThe Underlay Connection Status alert indicates a potential underlay communication issue

Network Metric name and explanation

Note: In Description field the stars * or ** or *** refer to which reference links (below) the definition was taken from.

NameShort namebase measurementDescription
Transmitted Raterate_sentMbpsTransmitted Rate — The rate of bits per second that has been transmitted by the interface on the Aviatrix gateway VM/instance. **
Received Raterate_receivedMbpsReceived Rate — The rate of bits the Aviatrix gateway has received per second. **
Total Raterate_totalMbpsTotal Rate — The total (bidirectional) rate of bits processed per second by the interface on the Aviatrix VM/instance. **
Peak Transmitted Raterate_peak_sentMbpsPeak Transmitted Rate — The highest bit rate that has been transmitted by the interface on the Aviatrix gateway VM/instance. **
Peak Received Raterate_peak_receivedMbpsPeak Received Rate — The highest bit rate that has been received by the interface on the Aviatrix gateway VM/instance. **
Peak Total Raterate_peak_totalMbpsPeak Total Rate — The highest bit rate that has been received and transmitted or both by the interface on the Aviatrix gateway VM/instance. **
Received Bytesrx_bytesMBNumber of good received bytes, corresponding to rx_packets.For IEEE 802.3 devices should count the length of Ethernet Frames excluding the FCS. ***
Compressed Packets Receivedrx_compressedcountNumber of correctly received compressed packets. This counters is only meaningful for interfaces which support packet compression (e.g. CSLIP, PPP). ***
Packets Dropped While Receivingrx_dropcountNumber of packets received but not processed, e.g. due to lack of resources or unsupported protocol. For hardware interfaces this counter may include packets discarded due to L2 address filtering but should not include packets dropped by the device due to buffer exhaustion which are counted separately in rx_missed_errors (since procfs folds those two counters together). ***
Errored Packets Receivedrx_errscountTotal number of bad packets received on this network device. This counter must include events counted by rx_length_errors, rx_crc_errors, rx_frame_errors and other errors not otherwise counted. ***
Receiver FIFO Framesrx_fifocount
Received Framesrx_framecount
Multicast Packets Receivedrx_multicastcount
Received Packetsrx_packetscountNumber of good packets received by the interface. For hardware interfaces counts all good packets received from the device by the host, including packets which host had to drop at various stages of processing (even in the driver). ***
Transmitted Bytestx_bytesMBNumber of good transmitted bytes, corresponding to tx_packets.For IEEE 802.3 devices should count the length of Ethernet Frames excluding the FCS. ***
Transmitted Carrier Framestx_carriercount
Collisions during Transmissiontx_collscount
Compressed Packets Transmittedtx_compressedcountNumber of transmitted compressed packets. This counters is only meaningful for interfaces which support packet compression (e.g. CSLIP, PPP). ***
Packets Dropped during Transmissiontx_dropcountNumber of packets dropped on their way to transmission, e.g. due to lack of resources. ***
Errored Packets Transmittedtx_errscountTotal number of transmit problems. This counter must include events counter by tx_aborted_errors, tx_carrier_errors, tx_fifo_errors, tx_heartbeat_errors, tx_window_errors and other errors not otherwise counted. ***
Transmission FIFO Framestx_fifocount
Transmitted Packetstx_packetscountNumber of packets successfully transmitted. For hardware interfaces counts packets which host was able to successfully hand over to the device, which does not necessarily mean that packets had been successfully transmitted out of the device, only that device acknowledged it copied them out of host memory. **
Bandwidth Ingress Limit Exceededbandwidth_ingress_limit_exceededcountThe number of packets queued or dropped because the inbound aggregate bandwidth exceeded the maximum for the instance. This is cumulative number of packets queued or dropped on each network interface since the last driver reset. *
Bandwidth Egress Limit Exceededbandwidth_egress_limit_exceededcountThe number of packets queued or dropped because the outbound aggregate bandwidth exceeded the maximum for the instance. This is cumulative number of packets queued or dropped on each network interface since the last driver reset. *
PPS Limit Exceededpps_limit_exceededcountThe number of packets queued or dropped because the bidirectional PPS exceeded the maximum for the instance. This is cumulative number of packets queued or dropped on each network interface since the last driver reset. *
Conntrack Limit Exceededconntrack_limit_exceededcountThe number of packets dropped because connection tracking exceeded the maximum for the instance and new connections could not be established. This can result in packet loss for traffic to or from the instance. This is cumulative number of packets queued or dropped on each network interface since the last driver reset. *
Linklocal Limit Exceededlinklocal_limit_exceededcountThe number of packets dropped because the PPS of the traffic to local proxy services exceeded the maximum for the network interface. This impacts traffic to the DNS service, the Instance Metadata Service, and the Amazon Time Sync Service. This is cumulative number of packets queued or dropped on each network interface since the last driver reset. *
Packets Transmitted Ratepkt_tx_ratepackets per secPackets Transmitted Rate — The total (transmitted) transmission in packet level per second. **
Packets Received Ratepkt_rx_ratepackets per secPackets Received Rate — The total (received) transmission in packet level per second. **
Total Rate (in packets)pkt_rate_totalpackets per secTotal Rate (in packets) — The total (bidirectional) transmission in packet level per second. Instance size impacts how many packets per second the gateway can handle. **
Bandwidth Egress Limit Exceeded Raterate_bandwidth_egress_limit_exceededpackets per sec(AWS Only) The number of packets queued or dropped per second because the outbound aggregate bandwidth exceeded the maximum for the instance. *
Bandwidth Ingress Limit Exceeded Raterate_bandwidth_ingress_limit_exceededpackets per sec(AWS Only) The number of packets queued or dropped per second because the inbound aggregate bandwidth exceeded the maximum for the instance. *
Conntrack Limit Exceeded Raterate_conntrack_limit_exceededpackets per sec(AWS Only) The number of packets dropped per second because connection tracking exceeded the maximum for the instance and new connections could not be established. This can result in packet loss for traffic to or from the instance. *
Linklocal Limit Exceeded Raterate_linklocal_limit_exceededpackets per secThe number of packets dropped per second because the PPS of the traffic to local proxy services exceeded the maximum for the network interface. This impacts traffic to the DNS service, the Instance Metadata Service, and the Amazon Time Sync Service. *
PPS Limit Exceeded Raterate_pps_limit_exceededpackets per secThe number of packets queued or dropped per second because the bidirectional PPS exceeded the maximum for the instance. *
Compressed Packets Received Raterate_rx_compressedpackets per secCompressed Packets Received Rate — The number of compressed packets received per second. **
Rate of Packets Dropped While Receivingrate_rx_droppackets per secRate of Packets Dropped While Receiving — The number of packets dropped per second while receiving the packets. **
Errored Packets Received Raterate_rx_errspackets per secErrored Packets Received Rate — The number of packets received per second that is flagged by the kernel as errored. **
Receiver FIFO Frames Raterate_rx_fifoframes/secReceiver FIFO Frames Rate — The number of overflow events per second when receiving packets. **
Received Frames Raterate_rx_frameframes/secReceived Frames Rate — The number of frame alignment errors per second when receiving packets. **
Multicast Packets Received Raterate_rx_multicastpackets/secMulticast Packets Received Rate — The number of multicast packets per second. **
Transmitted Carrier Frames Raterate_tx_carrierframes/secTransmitted Carrier Frames Rate — The number of frame transmission errors per second due to loss of carrier during transmission. **
Collisions Rate during Transmissionrate_tx_collspackets/secCollisions Rate during Transmission — The number of collisions per second during packet transmission. **
Compressed Packets Transmitted Raterate_tx_compressedpackets/secCompressed Packets Transmitted Rate — The number of correctly received compressed packets per second. **
Rate of Packets Dropped during Transmissionrate_tx_droppackets/secRate of Packets Dropped during Transmission — The number of packets being dropped per second while sending. **
Errored Packets Transmitted Raterate_tx_errspackets/secErrored Packets Transmitted Rate — The total number of transmit problems per second. **
Transmission FIFO Frames Raterate_tx_fifoframes/secTransmission FIFO Frames Rate — The number of frame transmission errors per second due to device FIFO underrun/underflow. **
Tunnel CountTotal number of IPSec tunnels on Aviatrix Gateways

Reference

Leave a Reply

Your email address will not be published. Required fields are marked *