Enterprise customers values Aviatrix CoPilot for track and gather evidential data on their network. The platform aggregates abundant Syslog and Netflow data, which can be used to establish baseline metrics for alerting. Customers can choose to modify or add/remove metrics to suit their specific needs. Here is a list of recommended baseline metric, as well as detail of each one’s meaning.
Recommended baseline metric
Credit to my friend: Ricardo Trentin
System Metric (For Controller/CoPilot/Gateways) | ||||
Memory Available (memory_available_per) (<= 20%) | ||||
CPU Used % (cpu_used_per) (>= 90%) | ||||
Percent Disk Free (hdisk_free_per) (<= 20%) | ||||
Network Metric (For Gateways) | ||||
PPS Limit Exceeded Rate (rate_pps_limit_exceeded) (>= 75) | ||||
Bandwidth Egress Limit Exceeded Rate (rate_bandwidth_egress_limit_exceeded) (>= 40) | ||||
Bandwidth Ingress Limit Exceeded Rate (rate_bandwidth_ingress_limit_exceeded) (>= 40) | ||||
Errored Packets Transmitted Rate (rate_tx_errs) (>= 40) | ||||
Errored Packets Received Rate (rate_rx_errs) (>= 40) | ||||
Rate of Packets Dropped While Receiving (rate_rx_drop) (>= 40) | ||||
Rate of Packets Dropped While Transmitting (rate_tx_drop) (>= 40) | ||||
Conntrack Limit Exceeded Rate (rate_conntrack_limit_exceeded) (>= 40) | ||||
Tunnel Count | ||||
Health Metric (For Gateways) | ||||
Gateway Status | ||||
BGP Peering Status | ||||
Connection Status | ||||
Underlay Connection Status |
Health Metric name and explanation
NAME | DESCRIPTION |
---|---|
BGP Peering Status | Reporting BGP peering status changes |
Connection Status | Reporting gateway tunnel connection status |
Gateway Status | Reporting gateway health status |
Underlay Connection Status | The Underlay Connection Status alert indicates a potential underlay communication issue |
Network Metric name and explanation
Note: In Description field the stars * or ** or *** refer to which reference links (below) the definition was taken from.
Name | Short name | base measurement | Description |
---|---|---|---|
Transmitted Rate | rate_sent | Mbps | Transmitted Rate — The rate of bits per second that has been transmitted by the interface on the Aviatrix gateway VM/instance. ** |
Received Rate | rate_received | Mbps | Received Rate — The rate of bits the Aviatrix gateway has received per second. ** |
Total Rate | rate_total | Mbps | Total Rate — The total (bidirectional) rate of bits processed per second by the interface on the Aviatrix VM/instance. ** |
Peak Transmitted Rate | rate_peak_sent | Mbps | Peak Transmitted Rate — The highest bit rate that has been transmitted by the interface on the Aviatrix gateway VM/instance. ** |
Peak Received Rate | rate_peak_received | Mbps | Peak Received Rate — The highest bit rate that has been received by the interface on the Aviatrix gateway VM/instance. ** |
Peak Total Rate | rate_peak_total | Mbps | Peak Total Rate — The highest bit rate that has been received and transmitted or both by the interface on the Aviatrix gateway VM/instance. ** |
Received Bytes | rx_bytes | MB | Number of good received bytes, corresponding to rx_packets.For IEEE 802.3 devices should count the length of Ethernet Frames excluding the FCS. *** |
Compressed Packets Received | rx_compressed | count | Number of correctly received compressed packets. This counters is only meaningful for interfaces which support packet compression (e.g. CSLIP, PPP). *** |
Packets Dropped While Receiving | rx_drop | count | Number of packets received but not processed, e.g. due to lack of resources or unsupported protocol. For hardware interfaces this counter may include packets discarded due to L2 address filtering but should not include packets dropped by the device due to buffer exhaustion which are counted separately in rx_missed_errors (since procfs folds those two counters together). *** |
Errored Packets Received | rx_errs | count | Total number of bad packets received on this network device. This counter must include events counted by rx_length_errors, rx_crc_errors, rx_frame_errors and other errors not otherwise counted. *** |
Receiver FIFO Frames | rx_fifo | count | |
Received Frames | rx_frame | count | |
Multicast Packets Received | rx_multicast | count | |
Received Packets | rx_packets | count | Number of good packets received by the interface. For hardware interfaces counts all good packets received from the device by the host, including packets which host had to drop at various stages of processing (even in the driver). *** |
Transmitted Bytes | tx_bytes | MB | Number of good transmitted bytes, corresponding to tx_packets.For IEEE 802.3 devices should count the length of Ethernet Frames excluding the FCS. *** |
Transmitted Carrier Frames | tx_carrier | count | |
Collisions during Transmission | tx_colls | count | |
Compressed Packets Transmitted | tx_compressed | count | Number of transmitted compressed packets. This counters is only meaningful for interfaces which support packet compression (e.g. CSLIP, PPP). *** |
Packets Dropped during Transmission | tx_drop | count | Number of packets dropped on their way to transmission, e.g. due to lack of resources. *** |
Errored Packets Transmitted | tx_errs | count | Total number of transmit problems. This counter must include events counter by tx_aborted_errors, tx_carrier_errors, tx_fifo_errors, tx_heartbeat_errors, tx_window_errors and other errors not otherwise counted. *** |
Transmission FIFO Frames | tx_fifo | count | |
Transmitted Packets | tx_packets | count | Number of packets successfully transmitted. For hardware interfaces counts packets which host was able to successfully hand over to the device, which does not necessarily mean that packets had been successfully transmitted out of the device, only that device acknowledged it copied them out of host memory. ** |
Bandwidth Ingress Limit Exceeded | bandwidth_ingress_limit_exceeded | count | The number of packets queued or dropped because the inbound aggregate bandwidth exceeded the maximum for the instance. This is cumulative number of packets queued or dropped on each network interface since the last driver reset. * |
Bandwidth Egress Limit Exceeded | bandwidth_egress_limit_exceeded | count | The number of packets queued or dropped because the outbound aggregate bandwidth exceeded the maximum for the instance. This is cumulative number of packets queued or dropped on each network interface since the last driver reset. * |
PPS Limit Exceeded | pps_limit_exceeded | count | The number of packets queued or dropped because the bidirectional PPS exceeded the maximum for the instance. This is cumulative number of packets queued or dropped on each network interface since the last driver reset. * |
Conntrack Limit Exceeded | conntrack_limit_exceeded | count | The number of packets dropped because connection tracking exceeded the maximum for the instance and new connections could not be established. This can result in packet loss for traffic to or from the instance. This is cumulative number of packets queued or dropped on each network interface since the last driver reset. * |
Linklocal Limit Exceeded | linklocal_limit_exceeded | count | The number of packets dropped because the PPS of the traffic to local proxy services exceeded the maximum for the network interface. This impacts traffic to the DNS service, the Instance Metadata Service, and the Amazon Time Sync Service. This is cumulative number of packets queued or dropped on each network interface since the last driver reset. * |
Packets Transmitted Rate | pkt_tx_rate | packets per sec | Packets Transmitted Rate — The total (transmitted) transmission in packet level per second. ** |
Packets Received Rate | pkt_rx_rate | packets per sec | Packets Received Rate — The total (received) transmission in packet level per second. ** |
Total Rate (in packets) | pkt_rate_total | packets per sec | Total Rate (in packets) — The total (bidirectional) transmission in packet level per second. Instance size impacts how many packets per second the gateway can handle. ** |
Bandwidth Egress Limit Exceeded Rate | rate_bandwidth_egress_limit_exceeded | packets per sec | (AWS Only) The number of packets queued or dropped per second because the outbound aggregate bandwidth exceeded the maximum for the instance. * |
Bandwidth Ingress Limit Exceeded Rate | rate_bandwidth_ingress_limit_exceeded | packets per sec | (AWS Only) The number of packets queued or dropped per second because the inbound aggregate bandwidth exceeded the maximum for the instance. * |
Conntrack Limit Exceeded Rate | rate_conntrack_limit_exceeded | packets per sec | (AWS Only) The number of packets dropped per second because connection tracking exceeded the maximum for the instance and new connections could not be established. This can result in packet loss for traffic to or from the instance. * |
Linklocal Limit Exceeded Rate | rate_linklocal_limit_exceeded | packets per sec | The number of packets dropped per second because the PPS of the traffic to local proxy services exceeded the maximum for the network interface. This impacts traffic to the DNS service, the Instance Metadata Service, and the Amazon Time Sync Service. * |
PPS Limit Exceeded Rate | rate_pps_limit_exceeded | packets per sec | The number of packets queued or dropped per second because the bidirectional PPS exceeded the maximum for the instance. * |
Compressed Packets Received Rate | rate_rx_compressed | packets per sec | Compressed Packets Received Rate — The number of compressed packets received per second. ** |
Rate of Packets Dropped While Receiving | rate_rx_drop | packets per sec | Rate of Packets Dropped While Receiving — The number of packets dropped per second while receiving the packets. ** |
Errored Packets Received Rate | rate_rx_errs | packets per sec | Errored Packets Received Rate — The number of packets received per second that is flagged by the kernel as errored. ** |
Receiver FIFO Frames Rate | rate_rx_fifo | frames/sec | Receiver FIFO Frames Rate — The number of overflow events per second when receiving packets. ** |
Received Frames Rate | rate_rx_frame | frames/sec | Received Frames Rate — The number of frame alignment errors per second when receiving packets. ** |
Multicast Packets Received Rate | rate_rx_multicast | packets/sec | Multicast Packets Received Rate — The number of multicast packets per second. ** |
Transmitted Carrier Frames Rate | rate_tx_carrier | frames/sec | Transmitted Carrier Frames Rate — The number of frame transmission errors per second due to loss of carrier during transmission. ** |
Collisions Rate during Transmission | rate_tx_colls | packets/sec | Collisions Rate during Transmission — The number of collisions per second during packet transmission. ** |
Compressed Packets Transmitted Rate | rate_tx_compressed | packets/sec | Compressed Packets Transmitted Rate — The number of correctly received compressed packets per second. ** |
Rate of Packets Dropped during Transmission | rate_tx_drop | packets/sec | Rate of Packets Dropped during Transmission — The number of packets being dropped per second while sending. ** |
Errored Packets Transmitted Rate | rate_tx_errs | packets/sec | Errored Packets Transmitted Rate — The total number of transmit problems per second. ** |
Transmission FIFO Frames Rate | rate_tx_fifo | frames/sec | Transmission FIFO Frames Rate — The number of frame transmission errors per second due to device FIFO underrun/underflow. ** |
Tunnel Count | Total number of IPSec tunnels on Aviatrix Gateways |