Telemetry header

Hello guys,

I want to use telemetry values such as hop ID’s for path tracing through a telemetry header. I’ve looked online and found this (https://p4.org/p4-spec/docs/INT_v2_1.pdf). In the section 5 it says something about INT Headers but I got confused.

The telemetry header that I have to write, it will look like the p4 files from GitHub’s tutorials or I have to do something else?

Thanks in advance!!

One perspective on the INT specification is that it does not specify only one possible way to add INT headers to data packets. Instead, it specifies several possible ways to add them, e.g. one option is to add them after a TCP or UDP header. Another is to add them after the IP header, but before any L4 header. I think there might be one or two other possibilities mentioned in that spec.

Thus, you have choices to make in an implementation. You need not implement all of those options, and if you do not, you must choose which you want.

I suspect there is probably at least one such option implemented in open source and published somewhere, and perhaps several options, but unfortunately I do not have a link handy to give you right now. Hopefully someone else reads this thread who knows and can supply one.

1 Like

Hi @AndyJohnson,

Sorry for the late response, I saw it some time ago but I was away from home.

Short response:

You do not have to follow the specification, you can go one by one and create fields for every INT header or create your own INT custom header(s). I have seen this in papers that used a minimal part of INT. To understand Section V please check long response.

Long response:

I have an old implementation of INT that I need to open-source at some point. I have not tidied it up so it will take me a little bit of time. I might also update BMv2 and P4RT libraries. If you can wait some time (until next Monday), I will try to upload it to a public repository.

If you really want to understand the spec of INT it will take you several readings, so do not worry. I generally work with INT-MD (INT is appended to the packet header stack). You have several ways to append INT (shim & metadata headers) + INT metadata stack (that incorporates data from each switch, hence the “stack”). Nowadays, apart form INT-MD, you can also find information about INT-XD (the old postcard mode: each switch generates report based on a Wachslist as packet traverses the path) and INT-MX (each node sends a report to collector based on INT instructions embed in packet) but I have not implemented them.

Usually, when I make a demo or a Proof-Of-Concept implementation, I place all headers involved in INT between TDP/UDP and the payload (there are plenty of ways to incorporate INT to a packet, as you can see in Section 5). Information about TCP and UDP is located in Section 5 (5.7.2). Apart from that, it also specifies the way to determine if the INT headers and stack are present. You can do it with (1) a “mock” UDP header (between original L3 and L4 headers, e.g. IPv4 and TCP) that uses an specific destination port (let’s say 1234) or the original UDP header if the packet already carries one. If the packet is already carrying UDP from the very beginning, INT shim keeps the original port stored in one of the fields that the last switch in the path should restore when removing INT from the packet. If you do not like it that way, (2) you can use the IPv4 DSCP or IPv6 Traffic Class field. You can just come up with a particular value (I saw 0x17 in an old spec and I like to use it). Consider that this configuration and 0x17 value might collide with actual values used for priorization. Finally the spec proposes (3) to include a 64-bit probe marker with a specific and “unique” value for INT. This way of determining the INT presence in a packet might also collide with a particular value that a switch/router specs after TCP or UDP. The nice thing about telemetry and INT is that you are not bound to any of this methods. You can go from a demo that expects INT is always placed after TCP or UDP (not a good case for production use), or using any other header’s fields to detemrine the presence of INT (like a reserved bit in a header fo your choice).

Once you can determine the presence of the INT headers (INT shim in this case as it is the first one), then you can determine the presence of the INT metadata header and the stack by check INT shim’s Iength field. The INT metadata header (Section 5.8) will determine, among other things, the kind of information each swtich must add to the stack (Instruction Bitmap), like the queue latency and/or switch id. In fact the spec includes plenty of examples but it is up to you which information you want to add, which instructions bits to use and how you want to organize it.

The way in which an INT source, INT transit or INT sink nodes behave is explained at the end of Section 5.8. The nodes perform different actions. In summary, the INT source node (typically first node after the source host), determines if a packet shall carry INT headers (if those are not added from source host) and adds INT headers and data to the metadata stack. Then the INT transit will determine if INT is present and add the data according to the instruction bitmap added by INT source. Generally, INT sink clones the packet and sends one report to the collector (see INT Telemetry Report format spec) and also removes the INT headers to send before sendign the packet to the destination host (as if INT was never present in the header stack). Consider that there are plenty of challenges in the process, such as surpassing the MTU, which you need to take into account.

I guess that you have already seen that you do not need to comply with the spec. When I made a demo, I only used a handful of fields from INT shim and metadata headers like the length field, instruction bitmap or NPT. You can use the offcial header but you will only end up using few headers, so do not worry about that. Feel free to choose a spec “compliant” header format or your own if you wish too. Take a look at the one I used, probably no compliant with latest spec but I will try to modify it.

/* INT shim header for TCP/UDP */
header int_shim_t {
    bit<8>  int_type;   // hop-by-hop or destination header
    bit<8>  rsvd1;
    bit<8>  len;  // Total length of INT metadata header
    bit<6>  dscp; // Store original DSCP value (if DSCP is used) else reserved
    // Will use this field in the paper
    // if 0x1 and remaining_hop_cnt is >1 then broadcast, else send to CPU
    bit<2>  rsvd2;
} // 4 bytes

/* INT header */
/* 16 instruction bits are defined in four 4b fields to allow concurrent
lookups of the bits without listing 2^16 combinations */
header int_meta_t {
    bit<4> ver;
    bit<2> rep;
    bit<1> c;
    bit<1> e;
    bit<1> m;
    bit<7> rsvd1;
    bit<3> rsvd2;
    bit<5> hop_metadata_len;
    bit<8> remaining_hop_cnt;
    bit<4> instruction_mask_0003; // check instructions from bit 0 to bit 3
    bit<4> instruction_mask_0407; // check instructions from bit 4 to bit 7
    bit<4> instruction_mask_0811; // check instructions from bit 8 to bit 11
    bit<4> instruction_mask_1215; // check instructions from bit 12 to bit 15
    bit<16> rsvd3;
} // 8 bytes


//HEADERS USED IN BiTS 0-3
/* INT meta-value headers - different header for each value type */

// bit 0:
header int_switch_id_t {
    bit<32> switch_id;
}

// bit 1:
header int_level1_port_ids_t {
    bit<16> ingress_port_id;
    bit<16> egress_port_id;
}

// bit 2: deq_timedelta
// the time, in microseconds, that the packet spent in the queue.
header int_hop_latency_t {
    // (bit<32>) standard_metadata.deq_timedelta;
    bit<32> hop_latency;
}

// bit 3: deq_qdepth;
// https://github.com/p4lang/behavioral-model/issues/311
// https://github.com/p4lang/behavioral-model/issues/493
// the depth of queue when the packet was dequeued.
header int_q_occupancy_t {
    bit<8>  q_id; // looks like not supported
    // (bit<24>) standard_metadata.deq_qdepth;
    bit<24> q_occupancy;
}

//HEADERS USED IN BiTS 4-7

// bit 4
// a timestamp, in microseconds, set when the packet shows up on ingress.
// The clock is set to 0 every time the switch starts. This field can be read
// directly from either pipeline (ingress and egress) but should not be
// written to.
// ingress_global_timestamp
header int_ingress_tstamp_t {
    bit<32> ingress_tstamp;
}

// bit 5
// egress_global_timestamp
header int_egress_tstamp_t {
    bit<32> egress_tstamp;
}

// bit 6:
header int_level2_port_ids_t {
    bit<32> ingress_port_id;
    bit<32> egress_port_id;
}

// bit 7
header int_egress_port_tx_util_t {
    bit<32> egress_port_tx_util;
}

// Other INT standard_metadata based headers
// deq_qdepth;
// the depth of queue when the packet was enqueued.
header int_enqueue_occupancy_t {
    bit<32> enq_occupancy;
}

header int_enqueue_tstamp_t {
    bit<32> enq_timestamp;
}

/* switch internal variables for INT logic implementation */
header int_metadata_stack_t {
    // Maximum int metadata stack size in bits:
    // (0xFF - 3) * 32 (excluding INT shim header and INT metadata header)
    // EDER: Can we express this as a function of max_hop or should we
    // just consider 8064?
    // I think this is (MAX_shim.len - 3) * TO_BITS
    // (255 - 3) * 32 = 8064
    // - MAX field value for shim.len (8 bits) = 255
    // - shim and metadata header length (in 4 byte words) = 3
    // - WORD_TO_BITS = * 32
    varbit<8064> data;
}
1 Like

@andyfingerhut thanks a lot for your response.

@ederollora hi!!

No need to apologize, it means a lot to me that you responded.

First of all, you’re way too helpful and I appreciate that. I read all of your responses and you were very insightful.

Of course I can wait and I really really thank you but before you waste your time for me, let me tell you that right now I only need to create aν ΙΝΤ header for path tracing. This header must add the switch id to the packet of every switch that it goes through.

You wrote this for me but I was thinking to write something like this https://github.com/p4lang/tutorials/blob/master/exercises/mri/mri.p4. Is this right or am I thinking it wrong? I gave it a quick look but I will look closely your code right now!

I’m not sure I understand about that value. This is a specific number that is used to determine that we have INT?

It takes some time to fully open the repository and clean the personal notes or unhelpful commented code. So be patient :slight_smile:

As @andyfingerhut and I mentioned, you do not really “need” INT to accomplish path tracing. Your telemetry header can be as limited as a one or two bytes. You can track how many nodes have added their ID to the packet by either using a counter as a field in your own telemetry custom header, or by doing something similar to what MPLS does. It is totally up to you where this telemetry header is placed, and the switches treat it. The exercise you link is a perfect example. As I see it, INT is an effort to bring the telemetry collection and reporting of network state into a framework or standard. But you do not need to comply with it, at all. Especially if your purpose is not production oriented.

This is the value that I used in the parser to determine if INT was present in the packet or not. Because I placed INT between L4 and its payload, this is a check done at TCP or UDP state, see the example:

state parse_tcp {
    packet.extract(hdr.tcp);
    transition select(hdr.ipv4.dscp) {
        DSCP_INT: parse_int_shim;
        default: accept;
    }
}

state parse_udp {
    packet.extract(hdr.udp);
    transition select(hdr.ipv4.dscp) {
        DSCP_INT: parse_int_shim;
        default: accept;
    }
}

state parse_int_shim {
    packet.extract(hdr.int_shim);
    transition parse_int_meta; //You should check shim length
}

state parse_int_meta {
    packet.extract(hdr.int_meta);
    transition parse_int_metadata_stack; //In reality you should check shim length
}

state parse_int_metadata_stack { //Taken from P4apps, Joghwan
    packet.extract(hdr.int_metadata_stack, (bit<32>) ((hdr.int_shim.len - 3) << 5));
    transition accept;
}

In your case, and to keep it simple, I would follow a similar procedure on how INT MD works (i.e., packets carry telemetry data) and using switch roles described in the specification (INT source, transit & sink) but with a very simple header. You can try something similar to an MPLS header, or even smaller. You can also use a single bit in the header that determined if any more path tracing headers are left in the packet. Since the MPLS header parser is already implemented by many people, you just have to adapt it to your own telemetry header. And that custom telemetry header you use, can replace INT shim, metadata and metadata stack header from the spec for just one header of a couple of bytes. That header might include just the necessary information, like one field to hold relevant data (switch_id) and a next_header field.

I hope the answer helps you,