P4/Tofino: Best practices for fallback metaifno matching (primary → secondary) without duplicate apply or heavy memory usage?

1418915702 · November 25, 2025, 9:08am

Hi，

I’m implementing a 3-stage packet classification on Tofino:
Stage-1: IP/Proto TCAM → outputs two metaifno (primary, secondary).
Stage-2: SRC port matching — use metaifno to lookup either TCAM or SRAM.

Requirement: if Stage-2 lookup using metaifno_primary fails, we should fallback and try metaifno_secondary. Hardware/compiler constraints:
P4 (Tofino) forbids multiple non-mutually-exclusive table.apply() on same table instance.
Dynamic bit-shifts / runtime 1 << remainder in P4 are not allowed on the target — so bitmap checks are performed by control-plane expansion (control-plane generates exact entries per bitmap bit).

What I tried:

Control-plane duplication — for every rule I install identical entries under both GID_primary and GID_secondary. Works logically but doubles table entries.
Data-plane fallback — create src_tcam_table/src_sram_table for primary and src_tcam_table_secondary/src_sram_table_secondary for secondary, then check primary tables then secondary tables. Works, but needs extra tables

Questions:

Are there common P4 patterns / idioms to implement a fallback over two group IDs without doubling table entries and without multiple apply calls on same table?
Is it acceptable/normal to duplicate table keys in the control plane in production (memory vs correctness tradeoff)? Any suggestions to reduce footprint?
Can action selectors / indirect resources (action profiles) be used to express fallback semantics efficiently?
Any Tofino-specific recommendations for implementing a prioritized fallback metaifno lookup?

Complete ACL matching flow in my data plane

        if (table1.apply().hit) {
            
            // Calculate port quotient for SRAM lookup
            ig_md.src_quotient = p.udp.sport[15:5];  // High 11 bits
            ig_md.dst_quotient = p.udp.dport[15:5];

            // Stage-2
            bool tcam1_match = tcam_table1.apply().hit;
            bool sram1_match = sram_table1.apply().hit;
            bool stage2_match_primary = tcam1_match || sram1_match;
            
            // Try secondary metainfo tables if primary fails 
            bool src_match2 = false;
            if (!stage2_match_primary && ig_md.metainfo_secondary != 511) {
                bool tcam2_match = tcam2_table_secondary.apply().hit;
                bool sram2_match = sram2_table_secondary.apply().hit;
                if (tcam_match_sec || sram_match_sec) {
                    stage2_match_secondary = true;
                }
            }
            
            bool stage2_match = stage2_match_primary || stage2_match_secondary;

            ...
            }
        } else {
            drop();

Thanks in advance for pointers or pointers to existing designs.

p4prof · November 25, 2025, 11:05am

Dear @1418915702 ,

How to perform a secondary lookup efficiently?

A typical idiom used in P4 to perform a lookup in a secondary table in case the lookup in the first table produces a miss is:

if (table1.apply().miss() {
    table2.apply();
}

For as long as there are no dependencies between tables table1 and table2 (normally this means that the default action used in the table1 does not attempt to write into any fields that can be written by table2), this construct can be compiled to occupy only one stage by speculatively performing the lookup in both tables and then discarding the results from the lookup in table2 should we have a hit in table1. That’s exactly what Tofino compiler will do.

Your code tries to simulate the same approach, but because it is more explicit it requires more than one stage. Otherwise, nothing prevents you from writing:

if (tcam_table1.apply().miss) {
   if (sram_table1.apply().miss() {
       if (tcam_table1.apply().miss) {
           if (sram_table1.apply().miss() {
               /* This is the REAL miss */
           }
       }
   }
}

and as long as there are no dependencies and there are enough resources, this will be fit into one stage on Tofino.

Is it OK to duplicate the contents of the tables?

Generally speaking this is totally OK for as long as the tables fit.

Even though some architectures allow multiple lookups on the same table, logically these are still two different tables and this is reflected in the P4 language. For example, a typical L2 switching code might perform two lookups in the “L2 table”: once with the key composed of the Destination MAC and a VLAN ID, and then with the key composed of the Source MAC and the VLAN ID.

When coding this in P4 we need to use two tables: dmac table will be used to perform the lookup based on the Destination MAC and the VLAN and smac table is used to perform the lookup based on the Source MAC and the VLAN. If a certain target can perform two lookups in the same table, the compiler for that target can merge these tables together (automatically or following the specific instructions from the programmer), and if a target can’t do that, these tables will be placed separately. Note also, that the contents of these tables will not be “same”: for example, multicast addresses will be present only in the dmac table.

As for your specific example, it is not quite clear to me what exactly UDP Source and Destination ports are used for in your data plane algorithm, but given that the keys are relatively small (11 bits) the tables should be small as well.

Any suggestions on reducing the footprint?

If you use a lot of the action data, then your hunch about using action profile to store the action data separately and share it between tables is correct – it can be done for as long as tables fit into one stage and there are guarantees that only one of them will execute its action per packet (which is the case if you use the nested if() statement above.

This will also help in keeping the action data consistent across the tables.

Shifting by a variable number of bits on Tofino

You are correct to mention that Tofino cannot perform such an operation natively, however it can be easily simulated.

If you really need to perform a shift, then you can do that with a small match-action table that has N actions. The shift value will be the key. Hence, the entry with the key value 1 will have an action that shifts the desired argument by 1 bit, the entry with key 2 will perform the shift by 2 bits, etc. The are other optimizations that can reduce the number of actions, as they will become a critical resource.

If all you need to do with the bitmap is to check iwhether bit N is set/clear then it is very easy to do with the TCAM.

Having said that, I noticed that you didn’t mention how such an operation could be helpful in your case. If you can elaborate, then one we can see if one of the approaches I just described might help in your case.

Last, but not least, if you are satisfied with my answers to your previous questions, please, acknowledge that and mark your question as answered. This will help other people on the forum, not to mention that it is a lot more fun to write a detailed reply knowing that it will not go into the void and be left unanswered.

Happy hacking,
Vladimir

1418915702 · November 25, 2025, 12:18pm

Dear Vladimir,

Thank you very much for your detailed explanation.
I realized that my original description was too confusing, so let me clarify my exact scenario with a much cleaner explanation.
My actual pipeline structure

After reviewing my design, the real logic is as follows:

Stage 1

A table lookup produces an action that contains two possible values:

DG8
DG0

Both values are needed for the next stage.
So the action of stage 1 outputs something like:
{ DG8 , DG0 }

Stage 2 (two tables inside the same stage)

In Stage 2, I have two tables:

TCAM table
SRAM table

The lookup key used in stage 2 is:
(DG8 or DG0) + processed_SRC_port

The real issue I am facing

Right now, to make DG0 + SRC_port match correctly in Stage 2,
my workaround has been:

To duplicate the tables (or create additional tables) in Stage 2, so that I can explicitly perform a second lookup using DG0.

However, this causes the number of tables to grow significantly.
I would prefer not to duplicate tables if there is a better P4 idiom or Tofino capability to express this logic.

What I actually want to express

I want to perform something logically equivalent to:
Attempt lookup with (DG8 + SRC_port).
If this always misses, attempt lookup with (DG0 + SRC_port).
But I want:

both attempts to stay within the same stage,
without explicitly duplicating tables,
and ideally letting the compiler perform speculative parallel lookups if possible.

So my question is:

**What is the recommended way in P4/Tofino to express

“use two possible keys (DG8/SRCport and DG0/SRCport) when only the second one may match,
without duplicating tables”?**

Should this be done by:

two separate tables,
one table with two key fields,
using action profiles,
or some other standard approach?

Any guidance on the best practice here would be extremely helpful.
Also

I have marked all previous questions as answered.
Thank you again for your help and your very detailed explanations!

Best regards

p4prof · November 25, 2025, 6:59pm

Dear @1418915702 ,

Thank you for providing additional details.

If I understand you correctly, by “the action of stage 1 outputs something like:
{ DG8 , DG0 }“ you seem to mean that the first table assigns two metadata variables (e.g. gid1 and gid2) with different values (e.g. 8 and 0 (if RID is equal to C3) or 1 and 0 if RID is equal to R1)? Is that correct?

If so, you can create a TCAM table that matches on both gid1, gid2 and the udp.src_port and then make sure that the entries that match on gid1 (and ignore gid2) have higher priorities than those that match on gid2 and udp.src_port (while ignoring gid1).

This method, however, might be expensive not applicable to exact match tables.