Learning of Trace Route, ICMP and IP route table

We are using traceroute very often and sometimes take it for granted, until an very interesting question hit me and we have do dive a little deeper to get the answer. Here’s the full story:

Aviatrix CloudN is an appliance that helps to deliver line rate of encryption from on-premises towards the Aviatrix Transit Gateways, it is shipped with three interfaces:

  • eth0 : WAN interface, this is where IPSec tunnels will be built towards Aviatrix Transit Gateways. Then BGP session will be established between CloudN to Aviatrix Transit Gateways.
  • eth1: LAN interface, this is where BGP is established between CloudN with on-premise router
  • eth2: MGMT interface, this is where you connect to CloudN for management, as well as where CloudN connects to internet for software updates.

It’s very common practice to have all three interfaces connected to the same router, have VRF configured on router to segment the three interfaces. As you may recall in my previous blog: Direct Connect to Aviatrix Transit – Option 3. The WAN/LAN/MGMT(not in the diagram) can connect to the same router as show below.

After we have CloudN inline with traffic, when customer tried to do a traceroute from on-premises towards cloud, they discovered that the CloudN hop was responded by the management interface IP, rather than LAN interface IP.

Customer is rightfully concerning that if the data traffic is actually going through MGMT interface instead of from LAN interface.

Before understanding traceroute, we need to understand what is Time to Live or TTL

(Following test is done in Windows, Linux/MAC command line may vary)

When we issue an command to ping google DNS server 8.8.8.8, windows ping issues 4 ICMP ECHO Request towards 8.8.8.8

ping 8.8.8.8

Pinging 8.8.8.8 with 32 bytes of data:
Reply from 8.8.8.8: bytes=32 time=27ms TTL=116
Reply from 8.8.8.8: bytes=32 time=23ms TTL=116
Reply from 8.8.8.8: bytes=32 time=26ms TTL=116
Reply from 8.8.8.8: bytes=32 time=23ms TTL=116

Ping statistics for 8.8.8.8:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 23ms, Maximum = 27ms, Average = 24ms

Each router would pass this ICMP to next hop until it reaches 8.8.8.8. 8.8.8.8 itself then send ICMP ECHO reply back directly to the client.

Packet capture shows:

  • Reply source 8.8.8.8
  • ICMP reply type is 0

When we tell ping to use Time to Live (TTL) switch, each router on it’s path will deduct TTL value by 1, and forward towards next hop if TTL-1 didn’t reach zero yet. On the first router where TTL-1 reaches 0, that specific router would return ICMP Time to Live Exceeded directly to client, and the packet is no longer forwarded to next hop.

Following example using TTL = 1, when trying to ping 8.8.8.8. The first hop (default gateway) received the packet and deduct TTL by 1 and reached to 0, then the first hop router responded back TTL expired in transit message

 ping -i 1 8.8.8.8

Pinging 8.8.8.8 with 32 bytes of data:
Reply from 192.168.68.1: TTL expired in transit.
Reply from 192.168.68.1: TTL expired in transit.
Reply from 192.168.68.1: TTL expired in transit.
Reply from 192.168.68.1: TTL expired in transit.

Ping statistics for 8.8.8.8:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),

Packet capture shows:

  • Reply source 192.168.68.1, not 8.8.8.8
  • ICMP reply type is 11

Now we try TTL = 2. First hop router deduct TTL by 1, it’s not zero yet. So it forward to second router, which also deduct TTL by 1 reached 0. The second router responded TTL expired in transit with it’s own IP.

ping -i 2 8.8.8.8

Pinging 8.8.8.8 with 32 bytes of data:
Reply from 192.168.1.1: TTL expired in transit.
Reply from 192.168.1.1: TTL expired in transit.
Reply from 192.168.1.1: TTL expired in transit.
Reply from 192.168.1.1: TTL expired in transit.

Ping statistics for 8.8.8.8:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),

Packet capture shows:

  • Reply source 192.168.1.1, not 8.8.8.8
  • ICMP reply type is 11

Traceroute does 3 ICMP request with TTL start from 1, next 3 ICMP request TTL set to 2, then it repeat this process until TTL reaches default 30

tracert 8.8.8.8

Tracing route to dns.google [8.8.8.8]
over a maximum of 30 hops:

  1     6 ms     4 ms     4 ms  192.168.68.1
  2     8 ms     6 ms     6 ms  192.168.1.1
  3    22 ms    22 ms    21 ms  69.194.50.2
  4    24 ms    24 ms    23 ms  69.194.50.1
  5    25 ms    22 ms    24 ms

ctrl + c canceled at this point

In the capture, you can see TTL start from 1 and increases by 1 every 3 requests. Also notice the responding IP matches what you see in tracert output.

Ok, we established trace route response came from the router, where TTL-1 gets to 0. Then why it get out from MGMT interface instead of LAN interface?

According to RFC1812

Except where this document specifies otherwise, the IP source address in an ICMP message originated by the router MUST be one of the IP addresses associated with the physical interface over which the ICMP message is transmitted. If the interface has no IP addresses associated with it, the router’s router-id (see Section [5.2.5]) is used instead.

For folks with Windows background, there is only one route table in OS. When I was at Microsoft Support, dual default gateways always was a No No. This article is still in effect since Windows NT: Multiple default gateways can cause connectivity problems

In Linux, multiple route tables co-exists. This made VRF (Virtual Routing and Forwarding) possible.

If you obtain diagnostics logs from CloudN, you can find following IP rules:

 "ip rule": [
                "0:\tfrom all lookup local ",
                "5:\tfrom all fwmark 0xf4240 lookup mgmt ",
                "10:\tfrom all iif lo lookup exclude_gateway ",
                "32766:\tfrom all lookup main ",
                "32767:\tfrom all lookup default"
            ],
  • 0: First it will look at ip route table local, where it won’t find a match
  • 5: Then it will look at packet that’s marked 0xf4240, which are packets came from mgmt interface eth0, this won’t be a match either, as the packet originated from CloudN itself
  • 10: Packet came from loopback or local will use exclude_gateway route table, this is a match as the ICMP TTL exceeded originated from CloudN itself
  • It will not process further rules as it found a match.

In the exclude_gateway route table, the only match is the default route, which will be sent via eth2, MTMT interface.

Conclusion: Trace Route responses is originated directly from the responding router itself, it doesn’t represent actual path data will be going through, as the actual data will be subjecting to different routing tables.

Another mystery solved 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *