Question about the actual input byte layout for Tofino hash when using multiple fields

Hi,

I have a question about the actual input format used by the Tofino hash function when multiple fields are provided as hash inputs.

As an example, during packet parsing, I place parsed header contents into the following fields:

struct ig_packet_parsed_header_t {
    bit<8>   field_1_1;
    bit<16>  field_1_2;
    bit<32>  field_1_3;
    bit<64>  field_1_4;

    bit<8>   field_2_1;
    bit<16>  field_2_2;
    bit<32>  field_2_3;

    bit<8>   field_3_1;
    bit<16>  field_3_2;
    bit<32>  field_3_3;
}

For different packets, some of these fields may not be populated by the parser, while others always contain valid values.
I then use these fields as inputs to a hash function, as shown below:

CRCPolynomial<bit<16>>( 0x8005, true, false, false, 16w0x0000, 16w0xFFFF) crc16_usb;
Hash<bit<16>>(HashAlgorithm_t.CUSTOM, crc16_usb) hash_16;
bit<16> value_1 = hash_16.get({field_1_1, field_1_2, field_1_3, field_1_4,
                               field_2_1, field_2_2, field_2_3,
                               field_3_1, field_3_2, field_3_3});

Assume that for a given packet, the parser does not populate field_1_4, field_2_3, field_3_1, and field_3_3 (i.e., these fields contain no valid parsed data). The other fields have the following values:

bit<8>   field_1_1 = 0x01
bit<16>  field_1_2 = 0x0203
bit<32>  field_1_3 = 0x04050607
bit<64>  field_1_4  // not populated

bit<8>   field_2_1 = 0x08
bit<16>  field_2_2 = 0x090A
bit<32>  field_2_3  // not populated

bit<8>   field_3_1  // not populated
bit<16>  field_3_2 = 0x0B0C
bit<32>  field_3_3  // not populated

My questions are:

  1. What is the actual byte sequence fed into the hash function?
  • Is it a compact concatenation of only the populated fields, for example:
01 02 03 04 05 06 07 08 09 0A 0B 0C
  • Or a reversed byte order (little-endian / network-order transformation), such as:
0C 0B 0A 09 08 07 06 05 04 03 02 01
  • Or are the unused fields padded with zeros according to their declared bit widths, for example:
01 02 03 04 05 06 07 00 00 00 00 00 00 00 00
08 09 0A 00 00 00 00
00 0B 0C 00 00 00 00
  1. I have tried to verify the result using an online CRC calculator (Sunshine's Homepage - Online CRC Calculator Javascript), but I cannot obtain results consistent with the P4 hash output. This might be due to incorrect CRC settings. Could someone clarify how the six parameters of CRCPolynomial correspond to the configuration options in the online CRC calculator?

Any clarification or references would be greatly appreciated.

Thanks!

Dear @LongP4 ,

Let me give you a couple of quick answers first.

  1. As per P4 specification, if you try to read from a variable (in this case a structure field) before it had a chance to be written, you will read an indeterminate value. Please, see section 8.25. Reading uninitialized values and writing fields of invalid headers
  2. By default, the bits are fed into the hash engine exactly in the order, specified in the program. In other words, imagine a bit string, obtained by concatenating the fields, you mentioned (e.g. using P4’s ++ operation). That’s the hash engine input.
  3. However, Tofino hash engines are much more powerful than that and all aspects of their behavior, including the order in which the bits are fed, the polynomial and the final XOR value are all controllable at run time as well. But the default is whatever you specified in the program.
  4. Barefoot Runtime Interface also provides a way to get the expected output from the hash engine programmatically.

You can also find a lot of detailed information on how the hash engines work, how to program them and how to use all these facilities for a variety of purposes in the course ICA-XFG-203: Action profiles, selectors, and traffic distribution. This course is available for purchase at P4ica Archives.

Happy hacking,
Vladimir

Dear @p4prof,
Thank you for your reply. Based on your suggestion, I printed the values of these fields using a digest and found that the value of the unused field_1_4 is unexpectedly not zero, but instead equals the value of field_1_3, as shown below:

digest - hash value: 24642 (0x6042)
digest - field_1_1: 118 (0x76)
digest - field_1_2: 0 (0x00000000)
digest - field_1_3: 1768187247 (0x6964656F)
digest - field_1_4: 1768187247 (0x000000006964656F)

I observed that this behavior is consistent across repeated recompilations and executions.

Here I would like to clarify that the fields in the above example are actually defined as headers, as shown below:

header 8_bytes_t {
    bit<64> value;
}

struct ig_packet_parsed_header_t {
    1_bytes_t  field_1_1;
    2_bytes_t  field_1_2;
    4_bytes_t  field_1_3;
    8_bytes_t  field_1_4;

    1_bytes_t  field_2_1;
    2_bytes_t  field_2_2;
    4_bytes_t  field_2_3;

    1_bytes_t  field_3_1;
    2_bytes_t  field_3_2;
    4_bytes_t  field_3_3;
}

After reviewing the code, I confirmed that field_1_4 is indeed never used. The parser states involving these fields are shown below:

state decision {
    packet.extract(p.decision);
    transition select(p.decision.decision) {
        1: parse_5_Bytes;
        2: parse_8_Bytes;
        default: reject;
    }
}

state parse_5_Bytes {
    packet.extract(p.field_1_1);
    packet.extract(p.field_1_3);
    transition accept;
}

state parse_8_Bytes {
    packet.extract(p.field_1_4);
    transition accept;
}

The parse_5_Bytes and parse_8_Bytes states are mutually exclusive, so field_1_4 should not contain the value of field_1_3.

To work around this issue, I added the following check:

if (!p.field_1_4.isValid()) {
    p.field_1_4.setValid();
    p.field_1_4.value = 0;
}

After doing so, I was able to obtain the correct hash value:

digest - hash value: 48930 (0xBF22)
digest - field_1_1: 118 (0x76)
digest - field_1_2: 0 (0x00000000)
digest - field_1_3: 1768187247 (0x6964656F)
digest - field_1_4: 0 (0x0000000000000000)

However, there are many such fields (headers) in the actual design, and adding conditional checks for each unused field would result in a large number of if statements. Is there a better or more general approach to handling the issue of undefined values in unused fields (headers)?

Dear @LongP4,

I am happy to hear that the root cause has been confirmed and you resolved the issue (although in a difficult way). This section in the spec that I mentioned is crucial for many compiler optimizations and should not be ignored.

I am not quite sure what are you trying to achieve in the first place and whether you rely on the validity of those headers anywhere else. Perhaps it would be easier to mark all of them valid and pre-initialize all of them with zeroes? That would allow you to eliminate the individual checks… Alternatively, you can use a pre-initialized struct and fill it with the data using .lookahead() method.

Happy hacking,
Vladimir

Dear @p4prof ,

Thank you very much for your reply. It was very helpful and inspired me to realize that I can directly initialize headers and set default values for their fields in the parser’s start state:

state start {
    // ...
    p.field_1_4.setValid();
    p.field_1_4.value = 0;
    // ...
}

After applying this initialization, I can consistently obtain the correct hash value.

1 Like