How to simulate a customized propagation delay?

I’m working on a project to implement a customized propagation delay on tofino2. I’ve found very few references online, and it seems that p4 doesn’t really have this feature either.
Is it feasible to send this packet to the cpu and process it through a python script? My idea is to receive the packet in python script, just sleep(), and then send it back to the switch. I wonder if this would cause the packet sent from the switch to the cpu during this sleep() time to not be successfully fetched by the python script due to sleep()? Or is it feasible to open a thread for each incoming packet to run this python script? Or is there a way for the python script written on the tofino2 to get the cpu clock and then keep track of the time the packet is passed to the cpu to determine exactly when to pass the packet back to the switch?
I would appreciate it if you would read or respond to my post!

What length of customized delays, and what error tolerance on those delays, were you hoping to achieve?

For example, adding somewhere in the range of 5 sec plus or minus 0.1 sec is much easier to achieve than adding somewhere in the range of 5 microsec plus or minus 10 nanosec.

Another thing to consider is the target rate that should be achieved.

Thanks for your reply! I expect to achieve a customized delay of roughly 50ms, with an error tolerance of about 1% or so.

I would suggest to test netem with virtual interfaces on the Linux side. If you already want to involve the host CPU, I would guess that a Linux qdisc is more performant than a user space script (if you don’t use kernel bypass).

https://man7.org/linux/man-pages/man8/tc-netem.8.html

If you use multiple queues on the Linux side and expose the cpu port as interface, netem should be (fairly) accurate, depending on your overall traffic rate.

Thanks for your reply! Your reply inspired me a lot. I’ll try it later.