The host ping across switches is unreachable

I would recommend debugging this step by step.

Pick one ping between hosts that is failing, e.g. h1 ping h3 (or whatever ping is currently not working). Start a fresh run of the entire network where you haven’t sent any packets yet, so that the switch log files are initially empty. Send one packet (or stop it at only a few, all the same, and stop it quickly). Look at the directory containing the switch log files, and see which ones are all the largest, meaning that they have printed log messages about processing at least one packet. Most of the log files should be very short, meaning they have not processed any packets.

Pick the first switch that you think it should have reached, and examine its log file. It is a bit tedious the first time, but you’ll get faster at looking for the most important things after going through one of them in detail. See where that switch sends its packet next, or whether it drops the packet.

If it sends the packet to another switch, go to that switch’s log file and see what it does with the packet, continuing until you get to a switch that either (a) drops the packet, or (b) does something you did not want it to do with the packet. Figure out why it is doing that unexpected thing from the log file, and think about whether it is because it has incorrect table entries, or wrong P4 code, or both. Fix the P4 code or table entries for that switch so you believe it is correct, and do another run from empty log files again.

After getting one failing ping to work, if you have ideas for what might be wrong with other switch’s table entries, try fixing all of the table entries. If you do not have an idea for something more general that might be wrong with the table entries yet, pick another pair of hosts where the ping is failing, and repeat the debug process above.

Note: This is general advice, for any network of devices, where you have detailed logging information on how they are behaving.

1 Like