Hi all,
I have already implemented the basic functionality of a P4 program on Tofino2, including IP lookup and port matching with a TCAM + SRAM bucket/bitmap design, and the program works correctly.
My current problem is that the stage usage is too high, and I want to optimize it.
In particular, for the SRAM path I use the port high bits to select a bucket and the low 5 bits to test one bit in a 32-bit bitmap. Right now this bit test is implemented with a long if-else chain selecting bitmap[0] … bitmap[31] for both source and destination ports. (In SRAM, I would perform a bitmap check as shown in the figure below)
code:
**`// -------------------------------------------------------------------
// Step 2: SRC Port Lookup (Stage 2)
// -------------------------------------------------------------------
// Try SRC TCAM path
src_tcam_table.apply();
// Try SRC SRAM path (parallel with TCAM)
src_sram_table.apply();
// Bitmap membership test for SRAM path
if (ig_md.src_sram_bucket_hit) {
// Extract bitmap bit at position src_remainder using compact if-else
if (ig_md.src_remainder == 5w0) ig_md.src_bitmap_bit = ig_md.src_bitmap[0:0];
else if (ig_md.src_remainder == 5w1) ig_md.src_bitmap_bit = ig_md.src_bitmap[1:1];
else if (ig_md.src_remainder == 5w2) ig_md.src_bitmap_bit = ig_md.src_bitmap[2:2];
else if (ig_md.src_remainder == 5w3) ig_md.src_bitmap_bit = ig_md.src_bitmap[3:3];
else if (ig_md.src_remainder == 5w4) ig_md.src_bitmap_bit = ig_md.src_bitmap[4:4];
else if (ig_md.src_remainder == 5w5) ig_md.src_bitmap_bit = ig_md.src_bitmap[5:5];
else if (ig_md.src_remainder == 5w6) ig_md.src_bitmap_bit = ig_md.src_bitmap[6:6];
else if (ig_md.src_remainder == 5w7) ig_md.src_bitmap_bit = ig_md.src_bitmap[7:7];
else if (ig_md.src_remainder == 5w8) ig_md.src_bitmap_bit = ig_md.src_bitmap[8:8];
else if (ig_md.src_remainder == 5w9) ig_md.src_bitmap_bit = ig_md.src_bitmap[9:9];
else if (ig_md.src_remainder == 5w10) ig_md.src_bitmap_bit = ig_md.src_bitmap[10:10];
else if (ig_md.src_remainder == 5w11) ig_md.src_bitmap_bit = ig_md.src_bitmap[11:11];
else if (ig_md.src_remainder == 5w12) ig_md.src_bitmap_bit = ig_md.src_bitmap[12:12];
else if (ig_md.src_remainder == 5w13) ig_md.src_bitmap_bit = ig_md.src_bitmap[13:13];
else if (ig_md.src_remainder == 5w14) ig_md.src_bitmap_bit = ig_md.src_bitmap[14:14];
else if (ig_md.src_remainder == 5w15) ig_md.src_bitmap_bit = ig_md.src_bitmap[15:15];
else if (ig_md.src_remainder == 5w16) ig_md.src_bitmap_bit = ig_md.src_bitmap[16:16];
else if (ig_md.src_remainder == 5w17) ig_md.src_bitmap_bit = ig_md.src_bitmap[17:17];
else if (ig_md.src_remainder == 5w18) ig_md.src_bitmap_bit = ig_md.src_bitmap[18:18];
else if (ig_md.src_remainder == 5w19) ig_md.src_bitmap_bit = ig_md.src_bitmap[19:19];
else if (ig_md.src_remainder == 5w20) ig_md.src_bitmap_bit = ig_md.src_bitmap[20:20];
else if (ig_md.src_remainder == 5w21) ig_md.src_bitmap_bit = ig_md.src_bitmap[21:21];
else if (ig_md.src_remainder == 5w22) ig_md.src_bitmap_bit = ig_md.src_bitmap[22:22];
else if (ig_md.src_remainder == 5w23) ig_md.src_bitmap_bit = ig_md.src_bitmap[23:23];
else if (ig_md.src_remainder == 5w24) ig_md.src_bitmap_bit = ig_md.src_bitmap[24:24];
else if (ig_md.src_remainder == 5w25) ig_md.src_bitmap_bit = ig_md.src_bitmap[25:25];
else if (ig_md.src_remainder == 5w26) ig_md.src_bitmap_bit = ig_md.src_bitmap[26:26];
else if (ig_md.src_remainder == 5w27) ig_md.src_bitmap_bit = ig_md.src_bitmap[27:27];
else if (ig_md.src_remainder == 5w28) ig_md.src_bitmap_bit = ig_md.src_bitmap[28:28];
else if (ig_md.src_remainder == 5w29) ig_md.src_bitmap_bit = ig_md.src_bitmap[29:29];
else if (ig_md.src_remainder == 5w30) ig_md.src_bitmap_bit = ig_md.src_bitmap[30:30];
else ig_md.src_bitmap_bit = ig_md.src_bitmap[31:31];
if (ig_md.src_bitmap_bit == 1) {
ig_md.src_sram_bitmap_hit = true;
}
}`**
I would like to ask:
- Is this bitmap bit-selection logic likely to be a major reason for high stage usage?
- What is the recommended way to optimize this kind of bitmap membership test on Tofino2?
- Are there common design patterns to reduce stage count for TCAM + SRAM + bitmap pipelines?
Any suggestions would be very helpful. Thanks!

