Implementation of priority queues in BMv2

a_mora · February 25, 2026, 9:46pm

I enabled the priority queues in simple_switch_grpc using the relative command line option --priority-queues [number of queues] and I’m queueing packets using the standard metadata priority.

I also used the set_queue_rate command from simple_switch_CLI to set custom rates for different queues.

Initially, I thought that a higher priority number meant higher precedence. However, after conducting some tests, I discovered that when sending packets using iperf with a queue rate that allocates a bandwidth of 0.5 Mbps to class 1 and 1 Mbps to class 2 (I send packets always of the same size), regardless of the assigned priority class, the packets exit the switch at a rate of 0.5 Mbps.

However, if I send only priority class 1 packets, I get the expected rate of 0.5 Mbps, and if I send only priority class 2 packets, I get the expected rate of 1 Mbps.

So, how are the priorities implemented? Are they just queues and is “priority” just a misleading name?

andyfingerhut · February 25, 2026, 10:05pm

If you look in this document: behavioral-model/docs/simple_switch.md at main · p4lang/behavioral-model · GitHub

You can find the following text: qid: when there are multiple queues servicing each egress port (e.g. when priority queueing is enabled), each queue is assigned a fixed unique id, which is written to this field. Otherwise, this field is set to 0. If priority queueing is enabled, the qid also describes the priority level of each queue. Starting with 0 that has the lowest priority until number_of_priority_queues - 1 that has the highest priority. The number of priority queues for each port can be defined by adding --priority-queues when running simple_switch.

In this document, it describes the behavior of the set_queue_rate command, which can be entered using the simple_switch_CLI command to start an interactive sesssion, where the command can be entered: behavioral-model/docs/runtime_CLI.md at main · p4lang/behavioral-model · GitHub

I have not personally attempted to use that command when there were multiple priority queues per port. Did you specify both a port and a priority value in your set_queue_rate command(s)? Or only a port?

My understanding of the implementation in BMv2 is that it implements strict priority queueing between different queues, i.e. among all queues that are “eligible” to send a packet (see below), the one with the highest numerical priority is selected.

By “eligible”, I mean:

the queue has at least one packet in it
the queue is recently below its configured maximum rate in transmission. I do not know the time constants involved here, e.g. whether it is “it hasn’t exceeded its rate over the last 1 second time interval” or “it hasn’t exceeded its rate over the last 1 millisecond time interval”, or some other time interval. There is likely to be some time interval like that in the implementation.

a_mora · February 25, 2026, 10:18pm

I specified both

I also found this in the source code of the simple_switch:

github.com/p4lang/behavioral-model

include/bm/bm_sim/queueing.h

6c7c93e54


      
          //! their respective maximum rate is reached. If no maximum rate is set, queues
          //! with a high priority can starve lower-priority queues. For example, if
          //! the queue with priority `nb_priorities - 1` always contains at least one
          //! element, the other queues will never be served.
          //! As for QueueingLogicRL, the write behavior (push_front()) is not blocking:
          //! once a logical queue is full, subsequent incoming elements will be dropped
          //! until the queue starts draining again.
          //! Look at the documentation for QueueingLogic for more information about the
          //! template parameters (they are the same).
          template <typename T, typename FMap>
          class QueueingLogicPriRL {
            using MutexType = std::mutex;
            using LockType = std::unique_lock<MutexType>;
          
           public:
            //! See QueueingLogic::QueueingLogicRL() for an introduction. The difference
            //! here is that each logical queue can receive several priority queues (as
            //! determined by \p nb_priorities, which is set to `2` by default). Each of
            //! these priority queues will initially be able to hold \p capacity
            //! elements. The capacity of each priority queue can be changed later by
            //! using set_capacity(size_t queue_id, size_t priority, size_t c).

Maybe the queueing is not working because I set a queue rate?

andyfingerhut · February 25, 2026, 10:23pm

This is a feature area in BMv2’s implementation where I have not done my own personal testing. You are welcome to browse through the C++ implementation code, and/or add debug print statements in it and rebuild it from source, which can often help aid in understanding how it is working today, and may lead to ideas for improvements.

p4prof · February 25, 2026, 10:33pm

Dear @a_mora ,

First, it’s important to differentiate between queuing priorities and shaping. They are largely orthogonal.

Second, it’s important to remember that queue priorities really start taking effect only in case of congestion. If there is no congestion, they do not really matter.

Third, it’s important to be aware of how the tools work to make sure things are really done in parallel, otherwise there will be little to see.

With BMv2 things get a little murkier, since it uses virtual ports. Thus, the first thing I’d recommend is to use mininet’s or Linux facilities to cap the egress port bandwidth.

I would also use two instances of Iperf injecting packets into two different ingress ports, while directing those packets into the single egress port, with the capped bandwidth.

I have to admit, that I have not personally tried that on BMv2, but I’ve seen plenty of similar cases on the real devices, such as Tofino, where the issue was not with implementation, but with testing.

Happy hacking,
Vladimir

a_mora · February 25, 2026, 10:41pm

I’m already capping the bandwidth of the exit link to 1 Mbps using Mininet, I wanted to give more bandwidth and priority to packets of a specific priority class

In my use case I’m dealing with multiple services that are sending packets from the same host, so I can’t send the packets on different ports according to the priority

andyfingerhut · February 25, 2026, 10:54pm

I will be a little more precise here:

I have used the ability to set a maximum veth link rate between two switches in Mininet, which is done in at least one of the exercises in the GitHub - p4lang/tutorials: P4 language tutorials repository.

I have also done a little bit of testing and investigation with BMv2’s priority queuing implementation such that it seems to work at a basic level, although I don’t thinlk I ever attempted to use set_max_queue_rate to configure maximum rates on the different priority queues in that very limited testing.

Someone else besides me may have tested what you are trying before on BMv2, but if so, I do not know who that is. It is certainly possible that there are bugs in BMv2 when configuring multiple priority queues, and using set_max_queue_rate to configure maximum rates on one or more of those queues. I do not know.

p4prof · February 25, 2026, 10:56pm

Dear @a_mora ,

It should be possible to create congestion even if you are sending packets into a single ingress port and forward them into another single egress port as long as the egress bandwidth is less than the ingress one.

You would definitely see the effects of priority queuing and shaping on the hardware device.

With BMv2 things might be more complicated depending on how the congestion on the egress port is propagated to the BMv2 traffic manager. In other words, at some point it should “know” that it cannot send the next packet out of the interface and needs to wait. It is precisely when prioritization occurs: while waiting to send a lower priority packet, it might receive a higher priority one and then send it instead.

I’m not sure this is how it is implemented, though.

Happy hacking,
Vladimir

Topic		Replies	Views
Sending packets to different priority queues and setting rates in BMv2	5	1533	February 13, 2023
Priority queueing Getting Started with P4	1	927	April 17, 2023
Queue depth of different priority queues Getting Started with P4	0	627	May 5, 2023
Managing Egress queue priority Getting Started with P4	1	725	December 21, 2021
Is there any way to get the queue depth information of different priority queues? Getting Started with P4	0	494	May 8, 2023

Implementation of priority queues in BMv2

Related topics