How to obtain Switch Startup Time to Synchronize Ingress Timestamp

Hello everyone,

I am trying to implement In-band Network Telemetry using P4 on BMv2 switches, and one metric I am trying to measure is the time taken by the packet to flow from the source to the sink switch (latency). I plan on measuring this using the difference between the ingress_global_timestamp readings when the packet reaches the two switches.

This is proving to be difficult, because these timestamps are reported relative to when each switch started, and not relative to a global time. Hence, a direct difference between the two timestamps isn’t correct.

My question is: How can I obtain the timestamp at which each switch started, so that I can offset the ingress_global_timestamp to obtain the correct difference between the two timestamps?

What I have tried

I tried to obtain the time difference between Source and Sink switch startup times via the Switch Logs. I implemented this in the control plane by reading the switch logs of the source and sink files (such as s1.log) and read the local timestamp from the first line of the file:

Switch 1 (Source)

[16:30:01.770] [bmv2] [D] [thread 4167] Set default default entry for table 'tbl_intv1l42': intv1l42 -

Switch 3 (Sink)

[16:30:03.194] [bmv2] [D] [thread 4200] Set default default entry for table 'tbl_intv1l42': intv1l42 - 

Since, the Sink switch reports the same first line 1.424 seconds after the Source switch, I incremented the reported ingress_global_timestamp values from the Source by this same value, and then computed the latency.

However, I am noticing occasional problems with this approach. Sometimes, the final latency (after offset) comes out to be negative.

Question: Should I instead use the Switch Log when the Thrift Server was started to compute the time difference instead?

Switch 1 (Source)

[16:30:01.906] [bmv2] [I] [thread 4167] Thrift server was started

Switch 3 (Sink)

[16:30:03.387] [bmv2] [I] [thread 4200] Starting Thrift server on port 9092

Which gives a different timedelta of 1.481 seconds. Using this timedelta as the offset didn’t give me a negative latency for the particular error example, but I’m not sure if this would always work.

Alternative Approach

Another approach could be to modify the source code to emit the timestamp as the local time (using gettimeofday() maybe) but I would like to avoid this method, if possible.

Kind Regards
Kartik Ramesh

Hi,

I have also been looking at this, although this is a tricky topic. There is a fantastic message here as you probably already saw. They explain the exact case you are facing now.

I am not sure if the ingress/egress timestamps offer you a precise measurement. What I mean is that one should know at which exact place in the abstract model they happen. For instance, Ingress and Egress timestamps seem to happen right before being processed at ingress and egress control blocks, respectively. So if Egress_S3 - Ingress_S1 works for you then just fine :slight_smile: but consider that you will not measure the Egress block processing at S3 right before the packet is deparsed. It might not be a big problem, but good to consider.

I have learned most of this information from Andy’s or Antonin Bas’ messages, but there are a few others too who might have a deeper understanding than mine. This message from Andy is nice to read too.

So the interesting thing now, what to do? To be honest, your process is a good start, when it comes to reading the first log message. The log timestamp might not be that accurate because BMv2 ingress/egress timestamps come ain microseconds and that is why some of your calculations might result into negative values. Besides, there might not be a consistency from the moment that the counter starts counting for timestamps to the moment that the log message is posted, so that can bring some problems too.

Unfortunately for you, I would go into modifying BMv2 code to call gettimeofday or something similar (like the README suggests). That should be the most accurate one when S1 and S3 run in the same machine.

In terms of your second question and Thrift, have you tried to use both the first log message and the Thrift one? I think both cases will probably throw errors because you are not sure that both messages were placed in both switches’ log files at the same time difference since the switch was started.

Maybe Andy @andyfingerhut has other ideas :slight_smile:

1 Like

Your suggestions sound reasonable. You should get especially good time synchronization if you use multiple simple_switch processes running on the same host and have them all call gettimeofday, since they are all then reading from a perfectly synchronized clock source.

I do not have any suggestions not already mentioned in your message, or in the messages I have previously written that you linked to.

1 Like

Hi, I also have the same time synchronization problem. I use mininet to build a bmv2 switch topology. I want to ask if I want to call a method like gettimeofday, how should I implement it?

The function get_ts is called from many places within simple_switch to get values for metadata fields like ingress_timestamp, egress_timestamp, enq_tstamp, deq_tstamp, etc.

It is defined here: behavioral-model/simple_switch.cpp at 27c235944492ef55ba061fcf658b4d8102d53bd8 · p4lang/behavioral-model · GitHub

In your own local copy of the behavioral-model code, you could change the definition of that function to call gettimeofday() instead of what it does now.

It sounds likely given that multiple people have shown interest in this over time to provide a command line option when starting simple_switch that would cause get_ts to either return its current value (the default), or some other time value, e.g. a value like nanoseconds, microseconds, or milliseconds since the Unix epoch.