Extract payload from packet

Hi,

I’m wondering if it is feasible to extract a payload (such as TCP/UDP payload) from the packet. I tried to define the payload by creating a new header but failed since the size of the payload varies. Thank you very much.

Hi svenchen,

You can identify the overall payload with a varbit header, but you lose granularity. I mean is difficult detailing the “payload fields” like headers fields. If you want to parse the payload out of the box/outside the p4 pipeline you can clone the packets and the truncate the cloned packet, in this way you extract the payload and you can parse it outside the switch. While, If there is a “repeated pattern” in the payload, you can extract it in the P4 parser as is done for mpls/vlan label tags (you can find some examples online).

I do agree with @DavideS. I remember seeing this topic in plenty of Github repo issues, the P4 dev mail list or in Slack. In general terms, P4 was never created with the purpose of parsing the payload or analyzing it, as far as I remember reading from a senior colleague. Consider that most of the headers you parse will always have the same size, unless you also focus on things like the INT metadata stack, IP/TCP options or an MPLS stack of headers. And even with header like MPLS, you are always bound to parse a certain number of headers. You cannot define a stack without stating the size beforehand.

I think it all comes to the question: what do you want to parse, precisely?.

  • If you want to parse one byte after transport headers, then no problem, that is achievable.

  • Do you want to parse 10 bytes (and always 10) from payload, even though the payload is bigger? That’s also possible.

  • Do you want to parse 700 bytes from payload? This might be possible in some targets (software switches) but likely not in hardware switches (memory).

  • Do you want to parse a variable size header (payload)?. That is possible in bmv2 (I think Tofino supports it now, right? :thinking:). However, this is a very inefficient way of parsing headers because of the flexibility that you need. Consider that software switches will always accommodate more complex operations compare to hardware targets. This is because of the lack of memory (for this particular purpose) in production hardware switches. Besides, last time I checked about variable size headers, you could not use their fields as table keys.

In any way, you can always be more flexible with bmv2, so you might be able to parse the payload like Davide mentions (as if it was an MPLS header), which is very similar to parse INT metadata.

Cheers,

1 Like

Thanks for your reply. It’s of great help. I have a follow-up question.

I’m new to the P4 language paradigm. Based on my understanding, struct headers are defined before parse and match action. Is it possible to define the header in the middle of the match action? For example, we cannot define the UDP payload as a header since the size varies. Is it feasible that we parse the UDP header first, extract the length, then define “payload header” based on the length and parse it?

If it’s not feasible in p4, can we extract a sequence of bits outside the header? For example, we want to extract 16 bit after UDP header ( the 16 bit is not defined in the head). Thank you!

Hi @svenchen ,

That’s actually a good question, I have never tried to defone headers within the apply { } block of a control or an action. I understand that you have to define headers as part of the struct headers_t { } that you define in the beggining. If you do not do that, then you cannot access that header like you do with TCP or IPv4 (hdr.tcp or hdr.ipv4).

You do not need to do that because the length of the payload can be extracted from the IP header (hdr.ipv4.totalLen). And this is available in the parser. Then you just need to tell the parser how many bytes out of hdr.ipv4.totalLen have to be extracted after UDP (possibly hdr.ipv4.totalLen - 20 - 8). Therefore, you can define a variable size header in the header struct and parse it. This is possible in bmv2 Simple Switch and other targets probably too. However, in the bmv2 Simple Switch case, you cannot modify the header, at least, last time I worked with them. You can only parse and then decide to deparse it or not.

For instance, you do this procedure to extract the INT metadata stack in In-band Network Telemetry (INT) use cases. You calculate the variable header size from another header’s field (INT shim header) and let the target know how many bytes you want to extract:


/* switch internal variables for INT logic implementation */
header int_metadata_stack_t {
    varbit<8064> data;
}

(...)

state parse_int_metadata_stack { //P4apps, Joghwan
    packet.extract(hdr.int_metadata_stack, (bit<32>) ((hdr.int_shim.len - 3) << 5));
    transition accept;
}

You can define an additional header (like header my16bitheader_t { (...) }, that extracts 16 bits just after UDP. You can also use b.lookahead<T>(), p.advance<T>() or b.extract<T>(_) (depends on your particular use case) if the 16bits you mention happen not to be defined just after UDP. Not sure what you want to achieve with this method but this is possible to do.

Cheers,