Is there any approach to apply ipv4 forwarding for the cloned packet without recirculating?

Hello,
I tried clone(CloneType.I2E, REPORT_MIRROR_SESSION_ID) in p4, and I found that I have to manually modify ipv4 header to make it forward successfully since I2E clone send packet to egress pipeline, but my ipv4_lpm table is in ingress pipeline as well. Applying recirculate() in egress works, but it doubles the bandwidth. I wonder if I can do this without applying recirculate the packet. Thanks.

Are you saying that you have an ipv4_lpm table in ingress, and the original uncloned packet should look up one key to get one result, but you want your cloned packet to look up a different key and get a different result?

If you can calculate somehow (e.g. a separate ipv4_lpm_for_clones table that is a copy of ipv4_lpm table) which output port you want the cloned packet to go to during ingress, you could have one mirror session per output port, each configured with a different output port to send the clone to.

I am guessing there are other options, but I’m not immediately thinking of any others.

If cloned packets are “unusual”, e.g. 1% of the packet need cloning, then note that recirculate doesn’t double the bandwidth, it only adds 1%. If you are cloning every single packet, then yes, it does double the number of packets processed in ingress.

Thanks for your help! To be more specific, I just need the cloned packet behave as the original packet and the output port are always same (there is only one destination for this kind of packet), so my mirror session is also fixed. So maybe I should copy the ipv4_lpm table in egress pipeline?

If you have multiple output ports that packets from ingress go to, e.g. 8 of them, then it seems to me that you would need 8 different mirror sessions, each one configured so that cloned packets go to a different output port, if you want the cloned packet to go to the same output port as the original.

Another solution I didn’t think of before is to use multicast in the traffic manager instead of unicast. Every packet that you would normally send as unicast to port X would instead go to a multicast group X (or some simple formula of X) that was configured to send two copies of the packet to port X, each with a different replication id so that egress processing can process them differently.

The problem you are describing is very typical for the high-speed hardware switches in general and it is related to the pipelined nature of the processing and appears in many different scenarios.

Here are some examples:

  1. Normally (in the regular SW stack case) we specify the IP address of the router. On most hardware switches this is not done, since it is expensive to perform yet another lookup in the routing table (and maybe even more than one). Instead, the NOS, controlling the switch resolves the nexthop’s IP address and the output of the routing table in the data plane is typically the nexthop info, such as the new L2 destination MAC address, the new VLAN and the egress port.
  2. You might notice that having the egress port as the output of a routing (or nexthop) table is excessive. Theoretically it can be obtained by performing an L2 lookup, and that’s what a SW stack would do. However in a high-speed switch it is usually too late to do it by then, which is why it is the responsibility of the NOS to put the correct egress port number into the nexthop table. BTW, it is also NOS’ responsibility to change this port if the MAC moves for whatever reason.
  3. Exactly the same thing happens if the packet has to be encapsulated into tunnel (e.g. IP-IP). Usually the new header will be added a lot later, way after the point where one can perform the lookup on that destination IP address. Instead the switch will again rey on the NOS to program all the parameters correctly for this scenario (i.e. the new destination MAC will correspond to the destination IP address in the new outer header, the port will be correct etc.).

The mirroring scenario (and many remote mirroring protocols, such as GRE add a bunch of new headers) is no different – typically the HW data plane should not be performing the lookup, based on the new outer headers. Instead, it is the responsibility of the control plane to know that for a certain mirror session you will be encapsulating the packet into an IP packet with the destination x.y.z.w and therefore it needs to form the proper ethernet header and also send the packet to the proper port.

It is certainly possible to try to implement a “smarter” data plane program that will attempt to perform this resolution on its own, but usually you will hit all the problems you’ve described and then some, because of the fact that (a) P4 controls cannot have loops and (b) that each P4 object (e.g. a table) can be “touched” by the same packet only once.

Happy hacking,
Vladimir

Thank you very much! So maybe I could add some specific controls in control plane to solve this problem?

Here is what is typically done:

  1. In your control plane you configure that the mirrored packets should be sent to the IP address 1.2.3.4
  2. The control plane resolves this address to the nexthop information, meaning that it figures out the egress port, the new MAC addresses, the new VLAN, etc.
  3. The control plane programs the mirror destination in the device to have this egress port and it also programs the tables in the egress control that are responsible for encapsulation
  4. The control plane continue to monitor the situation, so that if for some reason either the host 1.2.3.4 itself moves (e.g. the ARP stack can notice that) or the router that leads to the network 1.2.3.0/24 moves or the port goes down, etc. then it re-programs the necessary tables that comprise the mirror destination.

Thanks for all your reply! It helps me a lot.