Assume a P4-supported switch with n pipelines. If we recirculate a packet from a pipeline is it possible to send that recirculated packet to every pipeline?
Welcome to the community!
You’ve already asked the question here and I answered.
In the future, please, try to refrain from cross-posting – it is not a good form and wastes the community resources.
Happy hacking,
Vladimir
Dear Vladimir,
My question on the Opentofino repository was to know whether Tofino switches support the features or not.
And the goal to ask the question in this forum was to understand the relevant issues from a general perspective. As I am doing some research on multi-pipeline-based communications, I am also interested in what are the possible issues behind this feature. Most of the time I got some valuable concepts from the P4 community members. However, I should have mentioned that in my post.
Thanks for your quick reply. In the future, I will be more careful regarding cross-posting.
Thanks,
Robin
Hello Robin,
I this case you should be more specific here, e.g. you should indicate that you want to collect the information about all existing targets/architectures.
I will repeat what I said from Tofino/TNA perspective – it is trivial to do by multicasting a single packet to as many recirculation ports as one desires.
Separately from that, you should probably share more information about your ultimate goal. For example, it is important to understand that a single recirculation makes the target to process two packets instead of one. When you recirculate the packet to N pipes (instead of 1), the system will have to process N+1 packets) and that might not be cheap.
Happy hacking,
Vladimir
Hi Vladimir,
I humbly accept that my question was not fully qualified.
I am working on multi-pipeline architectures. My specific goal is to study the impact of packet recirculation on overall throughput.
Mainly I am interested in the following issues:
- A generic question: If we multicast the packets then they are resubmitted to the ingress/egress stage of a pipeline. Now at the moment, these packets are submitted already there are some packets that are supposed to be processed in the pipeline (for example there may be some other data packet that arrived through the incoming port). In that case, which packet will be processed with higher priority?
- Hardware-specific questions: The recirculated packet will face some delay. How this delay is determined (possibly for a pipeline running at 1 GHz clock speed)?
Robin
Hello Robin,
I am not quite sure what is there “to study”, really. Most systems I am familiar with process a certain number of packets per second, no matter where they come from.
Once you recirculate a packet once, it will be counted as two packets. If you recirculate each and every packet you will cut your performance in half. If you recirculate each and every packet 2 times, you cut your performance by the factor of 3, etc. If for each packet you create more than one recirculation copy, you will cut your performance accordingly as I explained in the previous comment.
The exact recirculation mechanism may differ between different targets and architectures. Even then, most of the time I’d assume that recirculated packets will be intermixed with the regular traffic in one way or the other, but would not recommend assuming any specific order or priority for example. Remember that most systems, even the not the most high-performant ones typically tend to process multiple packets at the same time.
The exact time it takes to recirculate a packet is also highly target dependent, but usually it is pretty short compared to the time it takes to process the packet.
Happy hacking,
Vladimir
Hi Valdimir,
- I am working on a project which selectively recirculates a packet based on some condition used for congestion identification. In these cases, it is important to get a fine-grained model of when the recirculated packet will be processed again. Unfortunately, I can not discuss the whole algorithm here. Here the goal is, can we achieve overall performance improvement at the expense of selective recirculation.2) Now assume, the packet forwarding logic is dependent on the congestion information carried by the recirculated packet. In this case, if the recirculated packet is not processed with the highest priority the normal data packers will use the old path. Which can lead to more congestion.
Robin
Hello Robin,
What you want to do is quite standard nowadays, but, indeed, this is just a bandaid. In order to implement really efficient and effective AQM algorithms we have to wait for the new generation of the hardware.
The concerns you cite are totally real and legitimate, but the answers to them are are highly architecture- and target-dependent (and also come with lots of caveats), so I highly doubt you can have a meaningful general discussion on that topic, except that note a couple of completely obvious, commonsense things:
- Modern high-speed hardware pipelines are long and usually process a lot of packets simultaneously. That means that no matter how quickly you can recirculate your information, there will be a certain number of packets in front of you
- Most sane pipelines do not allow packets to freely jump ahead of the others (in fact, that might be a part of the definition of what a pipeline is)
- You might also notice that the queueing information typically available to a given packet more commonly reflects the state of a specific queue only and thus might or might not be that useful to the packets that are ahead of it
- Most modern implementations use out-of-band (sideband) signaling for standard flow control mechanisms, such as PFC, pause, etc. precisely for that reason.
Happy hacking,
Vladimir
Hi Vladimir,
Thanks for summarizing the relevant issues.
Best regards,
Robin
I do not think that is a universally accepted definition of what a pipeline is. Instruction execution units in high performance CPUs often allow instructions to be executed out of order, and yet are still typically called “pipelines”. See for example Pipeline (computing) - Wikipedia
I am aware of some programmable network processors and NICs that explicitly allow packets to finish processing in a different order than they began processing, e.g. packet #1 could arrive and begin processing, then packet #2 arrives and begins processing in parallel with the processing of packet #1. Packet #2 can finish processing and go out before packet #1 finishes processing and goes out. This can be useful to achieve higher packet processing throughput in situations where there are large off-chip tables in DRAM, for example, with an on-chip SRAM cache, and packet #1 experiences a cache miss, but packet #2 gets a cache hit. Out of order finishing enables packet #2 to continue processing without being forced to wait for packet #1’s cache miss latency to occur.
In the systems I have seen where this is possible, there is often some control over which packets are allowed to jump ahead of others, e.g. a hash of some header fields is calculated, and if two packets have the same hash value, then the system does not allow them to be reordered in this way.