Whether ingress and egress share the same queue?

Hello everyone, I have some questions about the queue.

  1. In BMV2, we can only read the queue depth at egress. Does this mean that ingress and egress share the same queue?
  2. If so, we can modify the source code so that one port corresponds to two FIFOs. Can this make ingress and egress use one FIFO respectively?

Hi @junming ,

  1. In the V1 model and PSA between the ingress pipeline and the egress pipeline there is a block called traffic manager, which is the one who manages the queues. So we can say that the ingress pipeline and the egress pipeline share the same “queue.” But these queues are inside the ASIC/Software switch and are not the ones on the physical interfaces ( according to the spec those are not managed by the P4 language)

  2. I think is possible but i don’t know in detail the implementation of this part in the Bmv2 switch.

When packets arrive to the BMv2 software switch, they first go through ingress processing. Ingress chooses an output port, and the BMv2 traffic manager by default has one FIFO queue per output port (you can change a couple of lines of code and recompile BMv2 from source code, and then it will have K FIFO queues per output port, where K is a value in the source code you can choose).

After the packet scheduler chooses a packet from one of the K FIFO queues for the same output port to go out of the switch on that port next, that packet is dequeued from the FIFO queue it is on, and sent for egress processing, after which the packet goes output of that port.

There are ways to make slightly older versions of the FIFO queue depths available to read during ingress processing, but it takes some work on your part to write P4 code to specifically make that happen, e.g. every M packets in egress processing, recirculate a copy of the packet containing metadata with the queue depth, and when those recirculated packets are next processed in ingress, write the queue depths they contain into a P4 register array, with one element per FIFO queue.

The reason the FIFO queue depths are only available in egress processing is to model a similar limitation that exists in Tofino and likely some other switch ASICs, that the queue depths are only available there.

It is definitely possible to design a switch ASIC such that queue depths are readable in ingress processing, before going to the traffic manager (at least, slightly old versions of these queue depths, that do not include the last 100 nanosec or so worth of enqueue or dequeue operations), but ASICs designed that way typically use a mechanism similar to what I described above, just implemented in ASIC gates for you, rather than you having to write P4 code to do it yourself.

Hi @andyfingerhut,
For the same port, an input packet does not need to go through FIFO, but it is stored in a buffer (ingress). Then the packet is sent to the FIFO after coding (egress). Does this mean that FIFO is only used for egress.

In a switch ASIC, there could be many packets that end up being destined to the same output port, arriving across an arbitrary subset (perhaps all) of the input ports, N times faster than they can be transmitted on that output port.

You have two main choices:

  • buffer packets somewhere until they can be transmitted to the destined output port
  • drop the packets

(Yes, there are other variations that people have devised, too, e.g. truncate the packet to only its headers, and buffer that, when there is congestion. But there is only a finite amount of buffering, and it doesn’t help to ALWAYS make this choice for every packet. I will restrict the rest of this message to the two choices above).

If you buffer the packet, then at the data rates that modern switch ASICs tend to operate at, FIFO queues are a very common arrangement for how to buffer the packets waiting to be transmitted to an output port. You can imagine fancier non-FIFO buffering arrangements, but the fanciest that are in common use are to have a small fixed number of FIFO queues, all containing packets destined to the same output port, with a packet scheduling algorithm that chooses which of these queues to dequeue the first packet from next, and be sent to the output port.

These FIFO queues are explicitly represented in many software switches, e.g. BMv2/behavioral-model/simple_switch has by default one FIFO queue per output port. By changing a few lines of code and recompiling it, it can have a small number K of FIFO queues per output port, and simple packet scheduling algorithm to choose between them.

A switch ASIC might have other queues or packet buffers in it other than these, but if there are others, they tend to be quite small amounts of buffering compared to this central packet buffer of where packets are stored waiting for their destined output port to become available.

Are you thinking of packet buffering that exists for some reason OTHER than what I describe above? If so, do you have a picture or some other description of what you have in mind?