Hello everyone,
I am trying to implement In-band Network Telemetry using P4 on BMv2 switches, and one metric I am trying to measure is the time taken by the packet to flow from the source to the sink switch (latency
). I plan on measuring this using the difference between the ingress_global_timestamp
readings when the packet reaches the two switches.
This is proving to be difficult, because these timestamps are reported relative to when each switch started, and not relative to a global time. Hence, a direct difference between the two timestamps isn’t correct.
My question is: How can I obtain the timestamp at which each switch started, so that I can offset the ingress_global_timestamp
to obtain the correct difference between the two timestamps?
What I have tried
I tried to obtain the time difference between Source and Sink switch startup times via the Switch Logs. I implemented this in the control plane by reading the switch logs of the source and sink files (such as s1.log) and read the local timestamp from the first line of the file:
Switch 1 (Source)
[16:30:01.770] [bmv2] [D] [thread 4167] Set default default entry for table 'tbl_intv1l42': intv1l42 -
Switch 3 (Sink)
[16:30:03.194] [bmv2] [D] [thread 4200] Set default default entry for table 'tbl_intv1l42': intv1l42 -
Since, the Sink switch reports the same first line 1.424 seconds
after the Source switch, I incremented the reported ingress_global_timestamp
values from the Source by this same value, and then computed the latency.
However, I am noticing occasional problems with this approach. Sometimes, the final latency (after offset) comes out to be negative.
Question: Should I instead use the Switch Log when the Thrift Server was started to compute the time difference instead?
Switch 1 (Source)
[16:30:01.906] [bmv2] [I] [thread 4167] Thrift server was started
Switch 3 (Sink)
[16:30:03.387] [bmv2] [I] [thread 4200] Starting Thrift server on port 9092
Which gives a different timedelta of 1.481
seconds. Using this timedelta as the offset didn’t give me a negative latency for the particular error example, but I’m not sure if this would always work.
Alternative Approach
Another approach could be to modify the source code to emit the timestamp as the local time (using gettimeofday()
maybe) but I would like to avoid this method, if possible.
Kind Regards
Kartik Ramesh