I enabled the priority queues in simple_switch_grpc using the relative command line option --priority-queues [number of queues] and I’m queueing packets using the standard metadata priority.
I also used the set_queue_rate command from simple_switch_CLI to set custom rates for different queues.
Initially, I thought that a higher priority number meant higher precedence. However, after conducting some tests, I discovered that when sending packets using iperf with a queue rate that allocates a bandwidth of 0.5 Mbps to class 1 and 1 Mbps to class 2 (I send packets always of the same size), regardless of the assigned priority class, the packets exit the switch at a rate of 0.5 Mbps.
However, if I send only priority class 1 packets, I get the expected rate of 0.5 Mbps, and if I send only priority class 2 packets, I get the expected rate of 1 Mbps.
So, how are the priorities implemented? Are they just queues and is “priority” just a misleading name?
You can find the following text: qid: when there are multiple queues servicing each egress port (e.g. when priority queueing is enabled), each queue is assigned a fixed unique id, which is written to this field. Otherwise, this field is set to 0. If priority queueing is enabled, the qid also describes the priority level of each queue. Starting with 0 that has the lowest priority until number_of_priority_queues - 1 that has the highest priority. The number of priority queues for each port can be defined by adding --priority-queues when running simple_switch.
I have not personally attempted to use that command when there were multiple priority queues per port. Did you specify both a port and a priority value in your set_queue_rate command(s)? Or only a port?
My understanding of the implementation in BMv2 is that it implements strict priority queueing between different queues, i.e. among all queues that are “eligible” to send a packet (see below), the one with the highest numerical priority is selected.
By “eligible”, I mean:
the queue has at least one packet in it
the queue is recently below its configured maximum rate in transmission. I do not know the time constants involved here, e.g. whether it is “it hasn’t exceeded its rate over the last 1 second time interval” or “it hasn’t exceeded its rate over the last 1 millisecond time interval”, or some other time interval. There is likely to be some time interval like that in the implementation.
This is a feature area in BMv2’s implementation where I have not done my own personal testing. You are welcome to browse through the C++ implementation code, and/or add debug print statements in it and rebuild it from source, which can often help aid in understanding how it is working today, and may lead to ideas for improvements.
First, it’s important to differentiate between queuing priorities and shaping. They are largely orthogonal.
Second, it’s important to remember that queue priorities really start taking effect only in case of congestion. If there is no congestion, they do not really matter.
Third, it’s important to be aware of how the tools work to make sure things are really done in parallel, otherwise there will be little to see.
With BMv2 things get a little murkier, since it uses virtual ports. Thus, the first thing I’d recommend is to use mininet’s or Linux facilities to cap the egress port bandwidth.
I would also use two instances of Iperf injecting packets into two different ingress ports, while directing those packets into the single egress port, with the capped bandwidth.
I have to admit, that I have not personally tried that on BMv2, but I’ve seen plenty of similar cases on the real devices, such as Tofino, where the issue was not with implementation, but with testing.
I’m already capping the bandwidth of the exit link to 1 Mbps using Mininet, I wanted to give more bandwidth and priority to packets of a specific priority class
In my use case I’m dealing with multiple services that are sending packets from the same host, so I can’t send the packets on different ports according to the priority
I have used the ability to set a maximum veth link rate between two switches in Mininet, which is done in at least one of the exercises in the GitHub - p4lang/tutorials: P4 language tutorials repository.
I have also done a little bit of testing and investigation with BMv2’s priority queuing implementation such that it seems to work at a basic level, although I don’t thinlk I ever attempted to use set_max_queue_rate to configure maximum rates on the different priority queues in that very limited testing.
Someone else besides me may have tested what you are trying before on BMv2, but if so, I do not know who that is. It is certainly possible that there are bugs in BMv2 when configuring multiple priority queues, and using set_max_queue_rate to configure maximum rates on one or more of those queues. I do not know.
It should be possible to create congestion even if you are sending packets into a single ingress port and forward them into another single egress port as long as the egress bandwidth is less than the ingress one.
You would definitely see the effects of priority queuing and shaping on the hardware device.
With BMv2 things might be more complicated depending on how the congestion on the egress port is propagated to the BMv2 traffic manager. In other words, at some point it should “know” that it cannot send the next packet out of the interface and needs to wait. It is precisely when prioritization occurs: while waiting to send a lower priority packet, it might receive a higher priority one and then send it instead.
I’m not sure this is how it is implemented, though.