Match-Action Table with multiple types of keys

I ran into a situation that I’m not sure how resource allocation would work/look like. If I have a match action table that does a lookup with multiple keys using different types of matches say:

table sample {
        key = {
            hdr.field1 : lpm;
            hdr.field2 : exact;
            hdr.field3 : lpm;
        }
        actions = {
            a1;
            a2;
        }
    }

How would the TCAM/SRAM resources be allocated and what would the lookup table look like? How would it perform lookups with multiple fields? Would it perform a lookup on field1 first, and then field2, and then field3? Or would it somehow do all at the same time in parallel. I’m having trouble visualizing multi key lookups and how the final match is determined. Thank you!

Semantically, I do not think it makes much sense to have more than one field with match kind lpm in a table, because it literally means “longest prefix match”. If there is a single lpm field, and perhaps one or more other exact match fields, but no other match kinds, then it makes sense to say “among all the entries that have exact matches on the exact match fields, find all entries with a matching prefix on the lpm field, and the longest installed prefix is the winner”.

That does not generalize to multiple lpm fields, though, and I have not seen a P4 target that implements multiple lpm fields for the same table.

If you were to replace lpm with ternary in your example, then a common way to implement such a thing would be to put all three fields into a TCAM, but simply restrict the control plane to only install exact match masks for hdr.field2. That is NOT the only way to implement it correctly, and it is target-specific how exactly it is implemented in the hardware, with a few other variations possible than that simple approach.

1 Like

Thank you this helps a lot!

Wanted to follow up and confirm if I was understanding it properly: so multiple ternary matches would work but multiple lpm would not? I’m trying to perform some basic 5 tuple packet classification and I want to create a match action that involves all 5 fields as 5 keys for the classification rules. My initial thought was to just create a P4 table for it that uses 5 lpm types of key lookups. What is the suggested way for doing so?

Also, I noticed there is a range type of match. Would multiple range keys be supported and how are range lookups usually implemented in terms of RAM or CAM? Thank you as always.

To clarify, I found this old post in Github in one of your replies here (Questions on behavior of a few kinds of P4 tables in BMv2 · Issue #698 · p4lang/behavioral-model · GitHub):

I think I’m just a bit confused about how range entries are stored and done through hardware. I can see that ternary/lpm probably use TCAM w/ wildcards and masks while exact can use TCAM or SRAM hash lookups. I also understand the case where if one key is ternary and one is lpm, the ternary priority will be chosen as the implementation and so the lpm will be treated as a ternary field in a way. However, I fail to see how this generalizes to why range’s priority is the greatest.

In the case where you have one range key, one ternary key, and one lpm key, how will the ternary and lpm be converted to ranges and how will that be possibly implemented? Thank you!

lpm literally is an abbreviation for “longest prefix match”.

If you want the match behavior to include the length of the prefix as the criteria for determining the “winner” among multiple matching entries, then I personally do not see how that makes any sense at all if you allow a table with one lpm key field, and another key field that is any of lpm, ternary, or range. How would you define which entry should “win” among multiple possible matching entries? I have never seen anyone propose such a rule.

If you have such a proposed rule, then great, propose one and we can discuss it. If not, then let us NOT discuss a table with one lpm key field, and another key field that is any of lpm, ternary, range, or optional. That leaves you with the restriction that if one key field is lpm, then either there are no other key fields at all, or all others have exact match kinds, and I have described in my comment above how that makes perfectly good sense (to me, at least), in how you can define it.

If you have key fields where some are ternary, some are range, some are optional, and some are exact (but none of them are lpm), then the rule for whether a lookup key matches an entry installed by the control plane is that the condition for each field of the lookup key must match the ternary, optional, range or exact condition installed by the control plane for that one field, and that must be true for all lookup key fields, not only some of them. That determines whether the lookup key matches one installed entry.

Now conceptually imagine that you determine for each lookup key, the complete set of all installed entries that match that lookup key. There might be 0, 1, or more than one of them. If 0, then the lookup is a miss. If more than one, then every entry must have a numeric priority installed with it when the control plane added the entry, and that priority determines which of the multiple matching entries is the winner, and only the winning entry’s action is executed.

There is usually more than one way to implement any given idea, but one way to implement range matches is also to use a TCAM. If the range happens to be a “power of 2 aligned range”, i.e. it looks like [A x 2^B, (A+1) x 2^B - 1] for some integers A and B, then that range can be implemented with one TCAM value/mask, that will be a prefix mask, i.e. a mask that is exact match on some number of the most significant bits, and wildcard on the least significant bits.

For a range that is not “power of 2 aligned”, you can always break it up into multiple ranges that are “power of 2 aligned”, e.g. the range [0, 5] is [0, 3] plus [4, 5], and each of those smaller ranges is power of 2 aligned.

Suppose that you had a P4 table with field f1 that was range match kind, and field f2 was ternary match kind. If the control plane added a rule where f1 was range [0, 5], and field f2 was value V with mask M, then a system that used power-of-2-aligned TCAM entries to implement the range match kind would install 2 TCAM entries for this control plane rule:

(1) f1 has the value/mask that matches range [0,3], f2 has value/mask V, M
(2) f1 has the value/mask that matches range [4,5], f2 has value/mask V, M

Any lookup key that matches the original control plane rule must match exactly one of those two TCAM entries.

For a single field that was match kind range, implemented in this way, the number of TCAM entries required depends upon the precise range you want to match, but it turns out that for a W-bit field, the maximum number of TCAM entries required is 2*W-2.

There can be a scaling issue with this, however, if you have 2 or more fields that are both match kind range: you have to do this breaking up into power-of-2-aligned ranges for all of the fields, and then install as many TCAM entries as are in the “cross product” of each field by itself.

For example, suppose field f1 and f2 are both match kind range, and the control plane installs a rule where the range for field f1, by itself, requires 7 power-of-2-aligned ranges to represent it with TCAM value/masks, and field f2, by itself, requires 10 power-of-2-aligned ranges to represent it with TCAM value/masks. Then adding that one rule from the control plane would require 7*10 TCAM entries to represent all possible pairs of value/masks for field f1 and f2.

I mentioned above that there is usually more than one way to solve problems. There are specially designed hardware TCAMs that can support [min, max] ranges directly in the hardware. These are typically more expensive and custom than a regular TCAM hardware design. Basically inside of the TCAM logic for each entry, you can mark specific sets of bits as part of a range field, and instead of configuring a (value,mask) for those bits and the hardware does the calculation (lookup_key & mask) == value when the TCAM is searched, instead it does the calculation (min_value <= lookup_key) && (lookup_key <= max_value). Such special TCAM designs typically can do that for more than one field, but it requires that the hardware designer took into account this desire, and added extra logic gates into every single TCAM entry that could implement it. So it is a bigger TCAM design for the same number of entries, and thus more expensive than a “vanilla” TCAM that can only do value/mask matches.

2 Likes

Thank you for the in depth response. I think my initial reasoning behind inquiring about the situation involving one lpm key field with another key field of lpm, ternary, or range was I thought in a case in which lpm loses its “sense” and doesn’t make sense, it would still be allowed albeit it would lose its “l” longest meaning and be reduced to/treated as just a ternary entry w/ a mask followed by all wildcards. So 255.255.255.255/16 would become FFFF0000,0xFFFF0000 and be treated as a normal ternary pair of value/mask. But I also see now why it doesn’t make sense to support such a thing because lpm completely loses its intended meaning.

As a followup to the scaling issue you mentioned in which a single range check may be implemented as multiple TCAM entries depending on the range, in P4 does the table size field refer strictly to the amount of rules specified by the user or the max amount of TCAM entries that can be created by the system. For example, if I specify a table to have size 16, would the 16 mean I can specify explicitly at most 16 rules or however many I specify, the number of actual TCAM entries cannot exceed 16? Which in the case of multiple range types I guess I would have to be careful about specifying the table size.

As a small nitpick, does the table size include the default action? or would the total size be size+1

I think if someone wanted something like a ternary field, but you knew for some reason it was going to be restricted to always be a prefix mask, it would be somewhat clearer and more explicit to have a new match kind like prefix that could be mixed and matched with all other match kinds, but which required an explicit priority for each entry provided by the control plane when the entry was added. That would hopefully make it clearer that the winning entry had nothing to do with the prefix length.

The size property on tables is often taken in implementations I am aware of to be ‘up to this size, but perhaps less depending upon issues like the range implementation described, or hash collisions in multi-way hash table implementations of exact match tables, etc.’. Thus it is not really an easily determined predictable capacity, but an ideal one if things go well.

There are many classification algorithms often called “algorithmic TCAM” that do not use TCAM hardware, but implement similar classification methods, that are also usually quite unpredictable in how much memory they will use – it depends upon the precise set of rules you install.

On the nitpick question, I don’t think this is mandated by any P4 language specification – I would check with the vendor of your device if it matters to you.

1 Like

An article linked from the P4_16 language specification on the meaning of the size table property: p4-spec/p4-table-and-parser-value-set-sizes.md at main · p4lang/p4-spec · GitHub. (the link is in the section named “Size” that is a subsection of the “Table properties” section)

1 Like