What is the queue management approach in P4?

I am confused about the packet queue management approach in P4. Do all packets go into the egress port and queued? Or do they just go into the traffic management process and there are many queues in it and the packets are saved?

P4 specification specifically says (in the section 1 “Scope”) that it does not define “Mechanisms by which data are received by one packet-processing system and delivered to another system”.

Therefore, the answer to your question differs from system to system and it is defined by the system’s P4 Architecture. Which one re you asking about?

Most switch-like systems that have multiple ingress and egress ports typically need to have some form of a Traffic Manager so that they can deal with the bursts that inevitably arise when multiple ingress ports send traffic to a single egress port. In most cases the traffic has to be buffered and enqueued, so that packets can be processed in some order. Most modern implementations have more than one queue.

Happy hacking

Hi, thanks for your reply!
I still have some questions about that.
First, when multiple ingress ports send traffic to the same egress port, are the excessive packets buffered at the TM (traffic manager)? and then send to the egress port?
Second, I have seen the queue length field in the egress process of P4 program. But, I am confused about it. When we go into the egress process in the P4, does it mean the data packet goes into the egress port of the hardware? and the queue length field means the egress port queue?

Thanks for your reply again!

In most P4 architectures for switch ASICs that I know of (e.g. v1model, PSA, TNA), yes the traffic manager buffers packets that are destined to the same output port, if those packets arrive faster than that output port can send them. That is one of the primary reasons for the existence of a traffic manager – to buffer packets when they arrive faster than they can be sent to their destination (i.e. output port).

For the queue length field, its behavior might depend upon which queue length field you mean, which P4 architecture, etc. If you are looking at the v1model architecture running on the BMv2 software switch, then the definition of the standard/intrinsic metadata fields are documented here: behavioral-model/simple_switch.md at main · p4lang/behavioral-model · GitHub

Search for all occurrences of enq_qdepth and deq_qdepth to find where the meaning of these fields is documented. The queue being referred to there is the queue that the packet was placed in, within the traffic manager, before it goes to egress processing.

Typically in these kinds of switch ASICs, the traffic manager will only send packets to egress processing at exactly, or close to, the rate that the packets can be sent on the physical output port, so that no further buffering or queueing is required after the packet leaves the traffic manager.

1 Like

I just wanted to re-emphasize a very important point @andyfingerhut has made.

You wrote “I have seen the queue length field in the egress process of P4 program”. Instead, the truth is that you have seen it in a P4 program written for a specific architecture (be it v1model, PSA, TNA – you neglected to tell us). Different architectures use very different (implementations of) traffic managers and the semantics might be quite different, even if the fields have the same name.

As an example, in v1model architecture, the fields enq_qdepth and deq_qdepth specify the queue length (at the moment the given packet was enqueued and dequeued correspondingly) in packets, while in Tofino Native Architecture (TNA) the length is specified in terms of the number of bytes (cells) that are occupied by those packets. Obviously, these differences are quite fundamental. Similarly, you can find that some traffic managers might be implementing physical, per-port queues, while others might use Virtual Output Queueing (VOQ) approach – that can affect the semantics as well.

So, study the corresponding architecture (while remembering that P4 programs cannot be written without one :slight_smile: ) and that will allow to ask more pointed questions.

1 Like

Hi, sorry for missing the detailed description. :fearful:
I have seen the fields enq_qdepth and deq_qdepth in the tofino.p4 file and I am running the P4 switch with tofino asic.
I learned from your reply that the excessive packets are buffered at TM (queues in TM?) and then transmitted to the egress port. So, can we modify the queue management approach in TM by P4 program? like assign a special queue to the special packets or other modifications? Is there any relevant code I can refer to?


Thank you. It was not clear that you were asking about Tofino and TNA. I’ll give you quick answers here, but the best places to ask questions about Tofino and TNA are Tofino-specific forums, maintained by Intel. If you represent an academic or research organization, please use Intel Connectivity Research Program (ICRP) Forum. If you represent a commercial organization, please use Intel Premier Support (IPS).

Here are the answers to your questions:

Q1: Can we modify queue management approach in TM by P4 program?
A1: No, this is not possible, because Traffic Manager is a fixed-function component, not programmable in P4. Having said that, Tofino Traffic Manager is highly configurable component that offers a variety of different packet treatments for all kinds of use cases.

Q2: Is it possible to assign a special queue to the special packets
A2: Yes, it is possible. Tofino Traffic Manager has rich intrinsic metadata (please, see the structure ingress_intrinsic_metadata_for_tm_t). Among its fields, there is a qid field that allows P4 program to select a desired queue for each packet. The value of this field can be computed by P4 program using any desired algorithm. There are other fields that influence packet treatment, such is ingress_cos, packet_color, etc. I will not be able to discuss them in detail here – please use the forums I mentioned above.

Q3: Is there any relevant code I can refer to?
A3: Please, visit the Open-Tofino Github repository. It contains the documentation and some examples. I would also highly recommend attending Intel Connectivity Academy (ICA) classes.


Thanks for your detailed reply. :grinning:
I will turn to Intel Premier Support for more detailed answers.

Thank you very much for your detailed answer before. I would like to ask some similar questions about V1 model.
Q1: Can we get the length of each queue from the ingress pipeline? I want to compare the lengths of two specified queues and do some scheduling operations.
Q2: Is it possible to dequeue one packet from a specified queue and then enqueue it to another queue? Because I want to implement some operations similar to Deflection.
If you can answer my questions, I would appreciate it!

Q1: There is no direct way designed into the v1model architecture to read queue depths from ingress.

You could implement a way with some P4 code written by you, and probably a bit of control plane code as well, that would periodically send packets through ingress, then egress, where packet X would read queue depth X in egress, then recirculate, then write queue depth X in a P4 register in ingress. You would need different packets to update the P4 register for each different queue number X you wanted to be kept relatively up to date in ingress. Then your “normal” data packets could read that P4 register in ingress to get a relatively recent value of the queue depth. The more often the updater packets are sent through, the more up-to-date the queue depth values readable in ingress will be.

Q2: There is no direct way to dequeue a packet from one queue and then enqueue it into another queue. There is likely a way that you could, in egress, perform an egress-to-egress mirror/clone operation on the packet, to a clone session that will send the cloned copy to the desired output queue you want.

1 Like

Thanks for the ideas you provided, I know what to do next