Debugging a P4 code in general

Hello everyone,

Good morning and hope you all are doing well.

Since I was implementing an IPv4 checksum, I had to understand or see what was getting stored inside my intermediate variables. To do that, I analyzed the output log files and the internal structures of the code and figured out that the values of the user metadata structure get printed in the output log every time it’s processed during packet processing (debug_chksum in this case).

// User metadata structure
struct metadata {

    bit<16> tuser_size;
    bit<16> tuser_src;
    bit<16> tuser_dst;
    //bit<32> debug_chksum;  //To check the value of the checksum in the output logs


And, when I assigned an intermediate value to the meta.debug_chksum inside the control MyProcessing Pipeline, it got printed in the output log files as expected. That really helped me a lot to implement the task I was assigned. It could also be useful for other purposes as well. Hence, I decided to post it here.

Having said that, I was wondering if there are any other techniques for debugging a P4 code. Be it in the Processing or the Parsing pipelines. If there are, I would be really grateful if someone could make us enlightened over here in this thread. I tried Googling it first but couldn’t find anything valuable as of yet.

Thanks and regards,

Hello Sandeep,

In theory, you should be able to pack all intermediate values that you’d like to see into a custom header, attach (add it) to a packet, send the packet to a port, capture it and see the values. This is as close to a “generic” P4 facility as you can get, although it still requires a little more than the core language and its standard library, since they do not provide any means to send a packet out, only to form the headers.

In reality, this method has many limitations and is pretty intrusive. First of all, packing data in a special debug header can quite costly. Second, in order to observe the packet you might need to change its destination (especially when the packet was intended to be dropped in the first place). Overall, this requires you to change the source code of the program quite a bit and that’s a problem by itself, especially on the high-speed targets – the resulting program might not fit, the overall compilation might happen differently, etc. Obviously, we do know, that it is impossible to observe/measure anything without introducing at least some disturbances to the system (sometimes the bug “disappears” after you add printf()s to your program to debug it), but this is a whole different level.

Many architectures do provide other methods to get the data out without disturbing the original packet flow as much, such as cloning (mirroring), digests, etc. They have their own limitations as well, but overall are a lot better suited for the debugging. Note, that these facilities are highly architecture specific.

In addition to that, many targets provide debugging facilities that allow you to observe the program from the “outside”.

First, there are logfiles, provided by most (if not all) models. They are extremely helpful, but they have their own limitations. For starters, a lot depends on the faithfulness of the model. For example, behavioral models can help you to find a logical issue with your program, but are mostly useless in debugging an issue related to an incorrect compilation for a real target. Register accurate models do usually help with that, but might not be able to debug a problem related to a hardware bug. Cycle accurate models can help you with the last problem, but might be extremely slow and expensive to run (and usually produce an enormous amount of information too). Last but not least, many models might or might not fully model the device’s fixed-function components, so if the problem is somewhere in that are, they might not help either.

Last, but not least, some targets provide special debug facilities. Those are often extremely useful and powerful, but also very target-dependent. They can be as simple as counters that account for all kinds of events as the packet passes through the pipeline and as complex as special tracing facilities. Intel Tofino, for example, has both and we do teach those (and more) in our classes. However, given that you seem to be working with Xilinx (AMD) FPGAs, they will not be useful to you.

As for your specific method (assigning the values in question to some temporary metadata and watching the logs), again, a lot will depend on the specific target. For example, if these assignments have no further effect on the execution, an optimizing P4 compiler will probably remove them as a part of dead code elimination pass. Which is why it is better to pack these variable in to a header that is then attached to the packet.

Happy hacking,

1 Like