I’ve seen similar questions to this, but none have quite answered my question (I don’t think). I’m performing some basic research and I need a way to manipulate TCP payload data in P4.
I’d like to parse TCP packets, analyze part of the payload, and then (if I detect a specific pattern) manipulate some of the payload data. At this stage, it doesn’t matter which part of the payload I manipulate and I’m not quite sure what pattern I will be search for (assume an arbitrary number of bytes), we just need to know that we can detect/manipulate payload data and how we can to that.
I’ve been looking at the example TCP Options Parser (p4-guide/tcp-options-parser.p4 at master · jafingerhut/p4-guide · GitHub) and, although I don’t need the options parser, think it might be a good starting point (point me elsewhere if I’m on the wrong path). From what I’ve read on this forum, I believe I just need to parse all of the TCP header and then my pointer will be at the beginning of the Payload. So I imagine I’d just need to add an extra state to parse the payload (instead of going to accept) and my detector/manipulator logic would go into that parser.
Everyone was new to P4 at some point . You are more than welcome to ask as many questions as you want/need, no matter their simplicity or complexity.
I think that manipulating the payload of a packet is a very complex task in P4. I would risk and say that payload manipulation was not a primary objective of P4. The language does not seem to be designed to perform such a task. Actually, there are no for loops; if we ignore loop-like behaviors, one can observe them when extracting MPLS headers or other similar headers. Still, there is some room for application layer manipulation. In the end, we are just extracting and manipulating bytes.
In general terms, you will always find more flexibility in software switches (like bmv2) than hardware switches. Memory is scarce. I will assume you are using something like bmv2. In the past, I remember writing the code for detecting the HTTP methods. Possibly, modifying some bytes of some packets too. In general terms, I would recommend you define a use case that is not totally out of your expectations but still doable using P4. For instance, one case would be:
Always work with fixed headers in terms of bytes. You can work with 10, 20, or 30 at the same time. But I would discard using variable-size headers. As far as I remember, they cannot be manipulated, just extracted and discarded/emitted. I also remember someone stating that these headers are very inefficient for hardware devices.
Assume, for instance, that your relevant payload is 20 bytes after TCP or UDP headers. It could also be 10 bytes after the first 10 bytes that are not useful or relevant. You can also use tools like lookahead<T>() to check the next bytes before extracting them or skipping some bytes using advance<T>().
Make an easy method to detect a pattern, let’s say detect that bytes 3 and 4 are 0xFFFF (with a table) and then make an action to alter the bytes you want.
Assume there are no TCP options. You can start working with UDP (take a look at the scapy library in python). If you cannot work with UDP or ignore options, you can find some code that extracts or skips over it.
Once you build a case like this, it will be easier to introduce more complexity to your use case until to reach a point where you either have to send the packet to the controller, or you cannot do it in the Data plane using P4. Sometimes you can recirculate the packet for another set of processing, but that is also not very efficient. How many times can you recirculate/clone/resubmit a packet without experiencing noticeable delay for your service?
The TCP example is quite good; it explains your case well. But I would choose a simple path first. Extract the header directly (20 bytes), assuming no TCP options. Check some bytes with a table and manipulate them. Emit the packet and make sure it works as expected (use Wireshark). From that point, try to add more complex steps (or ask them here) until you reach a limitation. For instance, (1) you can start extracting more bytes (this is going to be a limitation mostly related to hardware switches). You can also (2) check more bytes in the table and manipulate more bytes at the same time. You can also (3) setInvalid() the header you extracted or add another bigger one. This is not an easy task since you have to manipulate length headers in IP, TCP, checksums, etc. the In-band telemetry (INT) examples around Github will help you. Here is (link) an example of what I mean in one of my public repos. You can also (5) manipulate data in a malicious way (if security is your field), and you work with unencrypted packets.
Answering specifically this:
If you are parsing TCP options and then you want to extract your payload, you can just transition extract_payload; to go from the (last) TCP option parsing state(s) as your default transition when no more TCP options headers are left to extract.
Thank you @ederollora , this is very helpful and encouraging. I have started down this path and I think I have the environment setup for success. So now onto actually manipulating some data. I will let you know how it goes.
Edit: To answer you question, right now I am using the multi-switch target in the p4App, which I believe is bmv2. And then we will probably implement the same architecture on hardware to start out.