Hi @AndyJohnson,
Sorry for the late response, I saw it some time ago but I was away from home.
Short response:
You do not have to follow the specification, you can go one by one and create fields for every INT header or create your own INT custom header(s). I have seen this in papers that used a minimal part of INT. To understand Section V please check long response.
Long response:
I have an old implementation of INT that I need to open-source at some point. I have not tidied it up so it will take me a little bit of time. I might also update BMv2 and P4RT libraries. If you can wait some time (until next Monday), I will try to upload it to a public repository.
If you really want to understand the spec of INT it will take you several readings, so do not worry. I generally work with INT-MD (INT is appended to the packet header stack). You have several ways to append INT (shim & metadata headers) + INT metadata stack (that incorporates data from each switch, hence the “stack”). Nowadays, apart form INT-MD, you can also find information about INT-XD (the old postcard mode: each switch generates report based on a Wachslist as packet traverses the path) and INT-MX (each node sends a report to collector based on INT instructions embed in packet) but I have not implemented them.
Usually, when I make a demo or a Proof-Of-Concept implementation, I place all headers involved in INT between TDP/UDP and the payload (there are plenty of ways to incorporate INT to a packet, as you can see in Section 5). Information about TCP and UDP is located in Section 5 (5.7.2). Apart from that, it also specifies the way to determine if the INT headers and stack are present. You can do it with (1) a “mock” UDP header (between original L3 and L4 headers, e.g. IPv4 and TCP) that uses an specific destination port (let’s say 1234) or the original UDP header if the packet already carries one. If the packet is already carrying UDP from the very beginning, INT shim keeps the original port stored in one of the fields that the last switch in the path should restore when removing INT from the packet. If you do not like it that way, (2) you can use the IPv4 DSCP or IPv6 Traffic Class field. You can just come up with a particular value (I saw 0x17 in an old spec and I like to use it). Consider that this configuration and 0x17 value might collide with actual values used for priorization. Finally the spec proposes (3) to include a 64-bit probe marker with a specific and “unique” value for INT. This way of determining the INT presence in a packet might also collide with a particular value that a switch/router specs after TCP or UDP. The nice thing about telemetry and INT is that you are not bound to any of this methods. You can go from a demo that expects INT is always placed after TCP or UDP (not a good case for production use), or using any other header’s fields to detemrine the presence of INT (like a reserved bit in a header fo your choice).
Once you can determine the presence of the INT headers (INT shim in this case as it is the first one), then you can determine the presence of the INT metadata header and the stack by check INT shim’s Iength field. The INT metadata header (Section 5.8) will determine, among other things, the kind of information each swtich must add to the stack (Instruction Bitmap), like the queue latency and/or switch id. In fact the spec includes plenty of examples but it is up to you which information you want to add, which instructions bits to use and how you want to organize it.
The way in which an INT source, INT transit or INT sink nodes behave is explained at the end of Section 5.8. The nodes perform different actions. In summary, the INT source node (typically first node after the source host), determines if a packet shall carry INT headers (if those are not added from source host) and adds INT headers and data to the metadata stack. Then the INT transit will determine if INT is present and add the data according to the instruction bitmap added by INT source. Generally, INT sink clones the packet and sends one report to the collector (see INT Telemetry Report format spec) and also removes the INT headers to send before sendign the packet to the destination host (as if INT was never present in the header stack). Consider that there are plenty of challenges in the process, such as surpassing the MTU, which you need to take into account.
I guess that you have already seen that you do not need to comply with the spec. When I made a demo, I only used a handful of fields from INT shim and metadata headers like the length field, instruction bitmap or NPT. You can use the offcial header but you will only end up using few headers, so do not worry about that. Feel free to choose a spec “compliant” header format or your own if you wish too. Take a look at the one I used, probably no compliant with latest spec but I will try to modify it.
/* INT shim header for TCP/UDP */
header int_shim_t {
bit<8> int_type; // hop-by-hop or destination header
bit<8> rsvd1;
bit<8> len; // Total length of INT metadata header
bit<6> dscp; // Store original DSCP value (if DSCP is used) else reserved
// Will use this field in the paper
// if 0x1 and remaining_hop_cnt is >1 then broadcast, else send to CPU
bit<2> rsvd2;
} // 4 bytes
/* INT header */
/* 16 instruction bits are defined in four 4b fields to allow concurrent
lookups of the bits without listing 2^16 combinations */
header int_meta_t {
bit<4> ver;
bit<2> rep;
bit<1> c;
bit<1> e;
bit<1> m;
bit<7> rsvd1;
bit<3> rsvd2;
bit<5> hop_metadata_len;
bit<8> remaining_hop_cnt;
bit<4> instruction_mask_0003; // check instructions from bit 0 to bit 3
bit<4> instruction_mask_0407; // check instructions from bit 4 to bit 7
bit<4> instruction_mask_0811; // check instructions from bit 8 to bit 11
bit<4> instruction_mask_1215; // check instructions from bit 12 to bit 15
bit<16> rsvd3;
} // 8 bytes
//HEADERS USED IN BiTS 0-3
/* INT meta-value headers - different header for each value type */
// bit 0:
header int_switch_id_t {
bit<32> switch_id;
}
// bit 1:
header int_level1_port_ids_t {
bit<16> ingress_port_id;
bit<16> egress_port_id;
}
// bit 2: deq_timedelta
// the time, in microseconds, that the packet spent in the queue.
header int_hop_latency_t {
// (bit<32>) standard_metadata.deq_timedelta;
bit<32> hop_latency;
}
// bit 3: deq_qdepth;
// https://github.com/p4lang/behavioral-model/issues/311
// https://github.com/p4lang/behavioral-model/issues/493
// the depth of queue when the packet was dequeued.
header int_q_occupancy_t {
bit<8> q_id; // looks like not supported
// (bit<24>) standard_metadata.deq_qdepth;
bit<24> q_occupancy;
}
//HEADERS USED IN BiTS 4-7
// bit 4
// a timestamp, in microseconds, set when the packet shows up on ingress.
// The clock is set to 0 every time the switch starts. This field can be read
// directly from either pipeline (ingress and egress) but should not be
// written to.
// ingress_global_timestamp
header int_ingress_tstamp_t {
bit<32> ingress_tstamp;
}
// bit 5
// egress_global_timestamp
header int_egress_tstamp_t {
bit<32> egress_tstamp;
}
// bit 6:
header int_level2_port_ids_t {
bit<32> ingress_port_id;
bit<32> egress_port_id;
}
// bit 7
header int_egress_port_tx_util_t {
bit<32> egress_port_tx_util;
}
// Other INT standard_metadata based headers
// deq_qdepth;
// the depth of queue when the packet was enqueued.
header int_enqueue_occupancy_t {
bit<32> enq_occupancy;
}
header int_enqueue_tstamp_t {
bit<32> enq_timestamp;
}
/* switch internal variables for INT logic implementation */
header int_metadata_stack_t {
// Maximum int metadata stack size in bits:
// (0xFF - 3) * 32 (excluding INT shim header and INT metadata header)
// EDER: Can we express this as a function of max_hop or should we
// just consider 8064?
// I think this is (MAX_shim.len - 3) * TO_BITS
// (255 - 3) * 32 = 8064
// - MAX field value for shim.len (8 bits) = 255
// - shim and metadata header length (in 4 byte words) = 3
// - WORD_TO_BITS = * 32
varbit<8064> data;
}