Decoding Header Stacks in Python (Scapy)

Hi Niklas,

Short and efficient answer: There is a way to automate Scapy to extract header automatically (bind_layers() etc.). Back in time I could not do this so I went step by step and extracted header by header. This is very inneficient so if you can investigate how to do it properly, I recommend it. Only take my example if you have nothing else. I used to check how IP was processed or how other headers are processed. Could not implement it in time, so I went the longest but easiest way (see my next answer). Consider that some headers like IP or TCP have options, or maybe other headers with TLV headers have to be similar (to a certain extent) if you are interested in extracting the INT meta stack. All those headers must have some kind of length field that defines “how many bytes” you have to extract.

Long but not so efficient answer: Let me show you my example. I have a private repository with an INT demo VM (hopefully public some day, when I have time) that does exactly this. My way of “decoding” the INT meta header from each hop is not the best but it worked. Let me tell you that I am not an expert in using scapy and when I programmed this I needed it to be fast so this is the solution I came up with. If I had to do this again, I would probably be more efficient and change some code parts. You should know that back in time I used to extract the information from Telemetry Reports. Not sure if the header remains the same, but headers and fields have probably changed if you check the latest specificaition.

First I define the metadata that I collect and could be added to the stack. For example the ingress global timestamp or the queue ID and queue occupancy. I just list a couple of examples so you understand my point:

class INT_q_occupancy(Packet):
    name = "Queue Occupancy"

    fields_desc = [
        ByteField('q_id', 0),
        BitField('q_occupancy', 0, 24),
    ]

class INT_ingress_tstamp(Packet):
    name = "Ingress Timestamp"

    fields_desc = [
        IntField('ingress_global_timestamp', 0),
    ]

Then, le me show you INT shim, meta and the Telemetry Report too:

class INT_shim(Packet):
    oName = "Telemetry Report Header"

    fields_desc = [
        ByteField('int_type', 0),
        ByteField('rsvd1', 0),
        ByteField('len', 0),
        BitField('dscp', 0, 6),
        BitField('rsvd2', 0, 2)
    ]

class INT_meta(Packet):
    name = "INT Metadata Header"

    fields_desc = [
        BitField('ver', 0, 4),
        BitField('rep', 0, 2),
        BitField('c', 0, 1),
        BitField('e', 0, 1),
        BitField('m', 0, 1),
        BitField('rsvd1', 0, 7),
        BitField('rsvd2', 0, 3),
        BitField('hop_metadata_len', 0, 5),
        ByteField('remaining_hop_cnt', 0),
        BitField('instruction_mask_0003', 0, 4),
        BitField('instruction_mask_0407', 0, 4),
        BitField('instruction_mask_0811', 0, 4),
        BitField('instruction_mask_1215', 0, 4),
        ShortField('rsvd3', 0),
    ]

class TelemetryReport(Packet):
    name = "INT telemetry report"

    fields_desc = [
        BitField("ver" , 1 , 4),
        BitField("len" , 4 , 4),
        BitField("nProto", 0, 3),
        BitField("repMdBits", 0, 6),
        BitField("rsvd", 0, 6),
        BitField("d", 0, 1),
        BitField("q", 0, 1),
        BitField("f", 0, 1),
        BitField("hw_id", 0, 6),
        IntField("switch_id", None),
        IntField("seq_no", None),
        IntField("ingress_tstamp", None)
    ]

This is how I handled the code, probbaly not the most efficient way but it worked:

def handle_pkt(packet, conn, flows):

    info = { }
    print("Handling report.")

    info["rec_time"] = datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f")

    pkt = bytes(packet)
    #print "## PACKET RECEIVED ##"

    ICMP_PROTO = 1
    TCP_PROTO = 6
    UDP_PROTO = 17

    ETHERNET_HEADER_LENGTH = 14
    IP_HEADER_LENGTH = 20
    ICMP_HEADER_LENGTH = 8
    UDP_HEADER_LENGTH = 8
    TCP_HEADER_LENGTH = 20

    INT_REPORT_HEADER_LENGTH = 16
    INT_SHIM_LENGTH = 4
    INT_SHIM_WORD_LENGTH = 1
    INT_META_LENGTH = 8
    INT_META_WORD_LENGTH = 2

    OUTER_ETHERNET_OFFSET = 0
    OUTER_IP_HEADER = OUTER_ETHERNET_OFFSET + ETHERNET_HEADER_LENGTH
    OUTER_L4_HEADER_OFFSET = OUTER_IP_HEADER + IP_HEADER_LENGTH


    INNER_ETHERNET_OFFSET = INT_REPORT_HEADER_LENGTH
    INNER_IP_HEADER_OFFSET = INNER_ETHERNET_OFFSET + ETHERNET_HEADER_LENGTH
    INNER_L4_HEADER_OFFSET = INNER_IP_HEADER_OFFSET + IP_HEADER_LENGTH

    INT_SHIM_OFFSET = INT_REPORT_HEADER_LENGTH+\
                      ETHERNET_HEADER_LENGTH+\
                      IP_HEADER_LENGTH


    eth_report = Ether(pkt[0:ETHERNET_HEADER_LENGTH])
    #eth_report.show()

    ip_report = IP(pkt[OUTER_IP_HEADER:OUTER_IP_HEADER+IP_HEADER_LENGTH])
    #ip_report.show()

    udp_report = UDP(pkt[OUTER_L4_HEADER_OFFSET:OUTER_L4_HEADER_OFFSET+UDP_HEADER_LENGTH])
    #udp_report.show()

    raw_payload = bytes(packet[Raw]) # to get payload

    telemetry_report = TelemetryReport(raw_payload[0:INT_REPORT_HEADER_LENGTH])
    #telemetry_report.show()

    inner_eth = Ether(raw_payload[INNER_ETHERNET_OFFSET:INNER_ETHERNET_OFFSET+ETHERNET_HEADER_LENGTH])
    #inner_eth.show()

    inner_ip = IP(raw_payload[INNER_IP_HEADER_OFFSET : INNER_IP_HEADER_OFFSET+IP_HEADER_LENGTH])
    #inner_ip.show()

    info["ip_src"] = (inner_ip.src).strip("'")
    info["ip_dst"] = (inner_ip.dst).strip("'")
    info["ip_proto"] = inner_ip.proto

    info["port_dst"] = 0
    info["port_src"] = 0

    inner_tcp = None
    inner_udp = None

    if inner_ip.proto == ICMP_PROTO:
        INT_SHIM_OFFSET+=ICMP_HEADER_LENGTH
        inner_icmp = ICMP(raw_payload[INNER_L4_HEADER_OFFSET : INNER_L4_HEADER_OFFSET+ICMP_HEADER_LENGTH])
        #inner_icmp.show()
    elif inner_ip.proto == TCP_PROTO:
        INT_SHIM_OFFSET+=TCP_HEADER_LENGTH
        inner_tcp = TCP(raw_payload[INNER_L4_HEADER_OFFSET : INNER_L4_HEADER_OFFSET+TCP_HEADER_LENGTH])
        #inner_tcp.show()
        info["port_src"] = inner_tcp.sport
        info["port_dst"] = inner_tcp.dport
    elif inner_ip.proto == UDP_PROTO:
        INT_SHIM_OFFSET+=UDP_HEADER_LENGTH
        inner_udp = UDP(raw_payload[INNER_L4_HEADER_OFFSET : INNER_L4_HEADER_OFFSET+UDP_HEADER_LENGTH])
        #inner_udp.show()
        info["port_src"] = inner_udp.sport
        info["port_dst"] = inner_udp.dport
    else:
        return

    INT_META_OFFSET = INT_SHIM_OFFSET + INT_SHIM_LENGTH

    #print("SHIM OFFSET: "+str(INT_SHIM_OFFSET))

    int_shim = INT_shim(raw_payload[INT_SHIM_OFFSET : INT_SHIM_OFFSET+INT_SHIM_LENGTH])
    #int_shim.show()
    int_meta = INT_meta(raw_payload[INT_META_OFFSET : INT_META_OFFSET+INT_META_LENGTH])
    int_meta.show()

    INT_METADATA_STACK_OFFSET = INT_META_OFFSET + INT_META_LENGTH
    # This is the key variable, it will tell you how many bytes of the stack you need to extract
    INT_METADATA_STACK_LENGTH = (int_shim.len - INT_SHIM_WORD_LENGTH - INT_META_WORD_LENGTH) * 4

    stack_payload = raw_payload[INT_METADATA_STACK_OFFSET:INT_METADATA_STACK_OFFSET+INT_METADATA_STACK_LENGTH]

    info = extract_metadata_stack(stack_payload,\
                           INT_METADATA_STACK_LENGTH,
                           int_meta.hop_metadata_len * 4,\
                           int_meta.instruction_mask_0003,\
                           int_meta.instruction_mask_0407,\
                           info)

    #Uncomment and magic happens
    #print(info)
    info["mon_id"] = get_flow_uuid(conn, info)

    insert_data_to_db(conn, info)

    sys.stdout.flush()

At this point, the info variable holds INT meta from all hops. If you uncomment print(info) you should be able to see the INT meta stack. The way that the INT meta stack is extracted is explained here:


def extract_0003_i0():
    return
def extract_0003_i1(b):
    return
#more of them until i10
def extract_0003_i10(b):
    data = {}
    s_id = INT_switch_id(b[0:4])
    s_id.show()
    hop_l = INT_hop_latency(b[4:8])
    hop_l.show()
    data["switch_id"] = s_id.switch_id
    data["hop_latency"] = hop_l.hop_latency
    return data
# more until finished
def extract_0003_i15(b):
    return


def extract_ins_00_03(instruction, b):
    if(instruction == 0):
        return extract_0003_i0(b)
    elif(instruction == 1):
        return extract_0003_i1(b)
    # more until i10, the one I decided to use for this example
        return extract_0003_i10(b)
    # more until the end of possible instructions (4 bits, 0 to 16)
    elif(instruction == 15):
        return extract_0003_i15(b)


def extract_metadata_stack(b, total_data_len, hop_m_len, instruction_mask_0003, instruction_mask_0407, info):

    numHops = total_data_len / hop_m_len

    info["instruction_mask_0003"] = instruction_mask_0003
    info["instruction_mask_0407"] = instruction_mask_0407
    info["data"] = {}

    #print("##[ INT Metadata Stack ]##")

    i=0
    for hop in range(numHops,0,-1):
        offset = i*hop_m_len
        #print("##[ Data from hop "+str(hop)+" ]##")
        info["data"]["hop_"+str(hop)] = {}
        if(instruction_mask_0003 != 0):
            data_0003 = extract_ins_00_03(instruction_mask_0003, b[offset:offset+hop_m_len])
            info["data"]["hop_"+str(hop)] = data_0003

        if(instruction_mask_0407 != 0):
            data_0407 = extract_ins_04_07(instruction_mask_0407, b[offset:offset+hop_m_len])
            info["data"]["hop_"+str(hop)].update(data_0407)

        i+=1

    return info

Consider that the Telemetry report length or INT meta length (to name two examples) are related to my specification implementation. You need to adjust most of the “constants” like INT_META_WORD_LENGTH or INT_REPORT_HEADER_LENGTH if my implementation does not fit your use case.

And of course, let me give you the whole file. It will be easier in the end. Ignore how I used to insert data into the database, I would probably use Influx now (MariaDB was easy at that point in time).

Cheers,