To what extent can P4 extract attributes/headers/etc from HTTPS packets

To what extent can P4 extract attributes/headers/etc from HTTPS packets?

I understand that P4 can extract information from protocols at Layer’s 4 and below:

`header ethernet_t {
bit<48> dstAddr;
bit<48> srcAddr;
bit<16> etherType;
}

header ipv4_t {
bit<4> version;
bit<4> ihl;
bit<8> diffserv;
bit<16> totalLen;
bit<16> identification;
bit<3> flags;
bit<13> fragOffset;
bit<8> ttl;
bit<8> protocol;
bit<16> hdrChecksum;
bit<32> srcAddr;
bit<32> dstAddr;
}

header tcp_t{
bit<16> srcPort;
bit<16> dstPort;
bit<32> seqNo;
bit<32> ackNo;
bit<4> dataOffset;
bit<3> res;
bit<9> tcp_flags;
bit<16> window;
bit<16> checksum;
bit<16> urgentPtr;
}

header udp_t {
bit<16> srcPort;
bit<16> dstPort;
bit<16> udplen;
bit<16> udpchk;
}

header icmp_t {
bit<8> type;
bit<8> code;
bit<16> checksum;
}

struct headers {
ethernet_t ethernet;
ipv4_t ipv4;
tcp_t tcp;
udp_t udp;
icmp_t icmp;
}
`

But how would one go about extracting Layer 6 protocol information, such as HTTPS? Are their examples of this?

To the extent that packets are encrypted, and your P4-programmable device does not have the decryption keys available, any part of the packet that is encrypted is effectively random gibberish to your P4 program.

I believe that for HTTPS, what you can see not-encrypted in packets are Ethernet + IPv4/IPv6 + TCP headers, and everything after the TCP header is encrypted, but I have not recently reviewed HTTPS to verify that.

Yes, I am aware that one would not be able to actually see the content of the HTTPS (as it is encrypted) but could one obtain packet features such as “Content-Length” from the header etc?

If by “Content-Length” you mean a field value defined by HTTP, then I am fairly sure that and every other HTTP field and value are part of the TCP payload data, and encrypted when you use HTTPS.

Ah yes you are correct, however I would be able to access the TLS packet information:

header tls_handshake_t {
bit<8> handshake_type;
bit<24> length;
}

@bcheevers123 ,

In addition to the excellent responses provided by @andyfingerhut , I think I should point out an often overlooked fact and that is that the TCP is a streaming protocol, meaning that packet boundaries are largely irrelevant there. An individual message can be split into any number of packets (at the extreme ed it can be 1 byte per packet), it does not need to start on the packet boundary (until you are looking at the very first byte of the stream), etc.

Usually, properly matching on anything inside a TCP stream (even an unencrypted one) requires you to (a) reassemble the stream and (b) use regular expression matching.

Both (a) and (b) are pretty difficult to implement in P4 unless the target provides a lot of specialized support for this functionality.

Happy hacking,
Vladimir

Thank you @p4prof for your insight here.

Yes, since TCP is a streaming protocol, I’ve written a python script that interfaces with P4 and attributes each packet to the corresponding flow/stream.

I suppose my main query was what fields could one extract from an HTTPS packet or TLS packet provided that one doesn’t know the decryption key. Things such as; number of bytes, size, content length etc.

Has anyone created a header(s) for this?