Does the P4 plan to support the expression for backgroud thread?

Now the P4 could set the entry timeout, but it’s enough for ip fragment when dataplane want to reassemble the frags.

It has to send the fragments to control cpu to hold the frags until assemble or timeout. So I want to know if the P4 plan to support the expression for backgroud thread which could do the work like above.

A few people have mentioned the idea of having a fixed-function component in a P4-programmable device that is similar to a Traffic Manager, in that it can store full packets and later send them back out, but different in what events trigger the storing and later reading back of the packets.

For the sake of a name, let us call this new fixed-function component a “random access packet buffer”, or RAPB for short.

Imagine writing a P4 control with some new intrinsic metadata output that said “when this P4 control is finished executing for this packet, if flag store_packet is true, then the RAPB will write this packet’s data somewhere into its memory that is currently free, i.e. not used by some other packet, and then it will send out the address where this packet was stored, or an error status indicating that there was no room available to store the packet, so it was discarded.”

The address and failed_to_store metadata values are sent as an “event” to another P4-programmable control that is logically after the RAPB, and that P4 code could do whatever you want with those values, but a typical example would be “if failed_to_store is false, record the address in some P4 register array with an appropriate index for my P4 program to read it back later”.

At any later time while processing a packet, you could output intrinsic metadata field read_packet (boolean) and a read address, and the RAPB on seeing the metadata read_packet=true would read the packet at the provided read address, and send it out to some P4-programmable control, which could then process it however you like in P4 code.

At yet another later time, you could output intrinsic metadata that could deallocate that packet. As an optimization for the common case, the RAPB could also support both reading and deallocating the packet at the same time, but having an option to read a packet but not yet deallocate would enable additional use cases, too.

I do not know of any P4-programmable device with a high-performance low-cost RAPB in it. There might be one or more that I have not heard of, though.

If I wanted to create such a P4-programmable device, I would look first to doing it in an FPGA, because then you could implement it in FPGA logic. Failing that, using a CPU port (or multiple ports) you could of course “implement a RAPB”, but then its price-performance-cost ratios would be whatever you can get with a general purpose CPU, which tends to be more $ and power-hungry for a given level of packet rate performance.

Note: When it comes specifically to the question of things like IP fragment reassembly, an important question to ask is “what performance do I actually need for this feature in a real deployment in a network?”

If the highest rate of IP fragments one sees in your network is 0.01% of all packets, then processing them all in a general purpose CPU might be perfectly acceptable, and going to the trouble of implementing a high performance low cost RAPB would be a waste of effort.

If 90% of network operators needed to do IP fragment reassembly or similar techniques for 90% of their traffic, I would guess that most switch ASICs for the last several decades would already have something like a RAPB in their hardware designs.

Thanks @andyfingerhut . I get your thought. The p4 purpose is focus on the 90% problem. Is it?

Anyone is welcome to add an extension to P4 for their target device that can implement a RAPB, or something fancier.

The P4 purpose is definitely to focus on things that you want to do in the “fast path” of a network device. It is not a general purpose programming language, e.g. while you might be able to offload some small parts of an implementation of a routing protocol like BGP, OSPF, IS-IS, etc. into a P4 program, the vast majority (or all) of such a routing protocol would typically run on a general purpose CPU, written in a language other than P4.

1 Like