Hi!
When I write a P4 program on BMv2 using the v1model
, I need to monitor queue length information for my congestion control algorithm. I obtain this information by reading standard_metadata.deq_qdepth
at the egress when each packet dequeues, to achieve a certain level of real-time queue monitoring.
My congestion control process works as follows: I implemented a real-time token bucket in the ingress (I didn’t use the built-in meter
because I need to dynamically update the rate in the data plane, which meter
doesn’t allow). When the queue length increases, I reduce the rate using an algorithm similar to QCN. Otherwise, I linearly increase the rate over time (currently 400bps every 50ms over a 1Mbps link).
However, when I examined the monitoring data, I noticed sudden increases in queue length. For example, in the following logs (first column: timestamp, second column: standard_metadata.deq_qdepth
printed via log_msg
):
23:44:09.487,0
23:44:09.488,0
23:44:09.496,0
23:44:09.497,0
23:44:09.505,0
23:44:09.506,0
23:44:09.514,0
23:44:09.645,27
23:44:09.646,26
23:44:09.647,26
23:44:09.647,25
23:44:09.648,24
23:44:09.649,23
23:44:09.650,22
23:44:09.651,21
23:44:09.651,20
23:44:09.652,19
23:44:09.652,18
23:44:09.653,17
23:44:09.653,16
23:44:09.654,15
23:44:09.655,14
23:44:09.655,13
23:44:09.656,12
23:44:09.656,11
23:44:09.657,10
23:44:09.658,9
23:44:09.658,8
23:44:09.659,7
23:44:09.659,6
23:44:09.659,5
23:44:09.659,4
23:44:09.660,3
23:44:09.661,2
23:44:09.661,1
23:44:09.662,0
23:44:09.665,0
23:44:09.675,0
23:44:09.685,0
23:44:09.689,0
23:44:09.706,0
23:44:09.707,0
23:44:09.718,0
Between 23:44:09.514
and 23:44:09.645
, there is a time gap of over 100ms, and the queue length suddenly jumps to 27. Prior to that, packets were dequeuing roughly every 8ms on average. This sudden jump seems odd.
Since my rate control implementation at the sending edge is based on a pacing time interval derived from rate (not window-based), it shouldn’t cause a burst of packets all at once.
My questions are:
What could be the reason for this behavior? Is it possible that BMv2 processes enqueue/dequeue in batches, and therefore the deq_qdepth
reading is not guaranteed to be accurate for every individual packet?
If so, is this batching behavior a specific characteristic of BMv2, or should I assume that commercial switches also exhibit similar batch processing, and thus I should design my congestion control algorithm based on this assumption?