Hello everyone,
I am implementing a HyperLens-like ACL packet classification pipeline on Intel Tofino2 using P4 and BF-RT. I am testing the forwarding behavior with ClassBench-generated ACL rule sets and packet traces.
I encountered a strange issue: after adding src_tcam_table.apply() into the ingress pipeline, the switch starts dropping a large and stable portion of packets. This happens even when the SRC TCAM table is empty. I would like to ask for advice on how to further locate and fix this issue.
1. Overall framework
My packet classification pipeline has two main paths.
Path A: five-tuple TCAM
This table is used for high-priority or fallback five-tuple rules.
if (five_tuple_tcam.apply().hit) {
ig_md.five_tuple_hit = true;
return;
}
The table action can directly forward packets to port 2/0, whose dev port is 144.
Path B: main path
The main pipeline is organized as follows:
-
ip_table
MatchesSrcIP / DstIP / Protoand producesGroup_id. -
src_tcam_table/src_sram_table
MatchesGroup_id + src_portand producesGroup_id2. -
dst_tcam_table/dst_sram_table
MatchesGroup_id2 + dst_portand decides the final forwarding result.
The simplified logical flow is:
five_tuple_tcam.apply();
ip_table.apply(); // produces ig_md.Group_id
src_tcam_table.apply(); // key = Group_id exact + l4_src_port ternary
src_sram_table.apply(); // bitmap-based source port path
// choose Group_id2
dst_tcam_table.apply();
dst_sram_table.apply();
// final forward or drop
For throughput testing, I set THROUGHPUT_TEST_MODE = true, so packets should be forwarded to port 2/0 instead of being dropped on table misses.
2. Packet processing flow
The L4 ports are normalized from TCP or UDP headers:
if (p.tcp.isValid()) {
ig_md.l4_src_port = p.tcp.sport;
ig_md.l4_dst_port = p.tcp.dport;
} else if (p.udp.isValid()) {
ig_md.l4_src_port = p.udp.sport;
ig_md.l4_dst_port = p.udp.dport;
} else {
ig_md.l4_src_port = 16w0;
ig_md.l4_dst_port = 16w0;
}
The port quotient and remainder are also precomputed for SRAM bitmap lookup:
ig_md.src_quotient = ig_md.l4_src_port[15:5];
ig_md.src_remainder = ig_md.l4_src_port[4:0];
ig_md.dst_quotient = ig_md.l4_dst_port[15:5];
ig_md.dst_remainder = ig_md.l4_dst_port[4:0];
The problematic table is:
table src_tcam_table {
key = {
ig_md.Group_id : exact;
ig_md.l4_src_port : ternary;
}
actions = {
set_src_gid2_from_tcam;
nop;
}
default_action = nop;
size = 12283;
}
Here, ig_md.Group_id is written by ip_table, and then immediately used as a key field in src_tcam_table.
3. Problem observed
When I only execute direct forwarding, the traffic is forwarded normally:
forward_to_port2();
return;
When I execute only the five-tuple table and IP table, forwarding is also normal:
if (five_tuple_tcam.apply().hit) {
return;
}
ip_table.apply();
forward_to_port2();
return;
However, once I add:
src_tcam_table.apply();
the switch starts losing a large number of packets.
The observed behavior is roughly:
Ingress port 1/0 RX: about 300 million packets
Egress port 2/0 TX: about 70 million packets fewer
The packet loss amount is relatively stable across repeated tests.
4. Tests already performed
Test 1: Direct forwarding only
P4 logic:
forward_to_port2();
return;
Result: forwarding is normal.
This suggests that the physical input and output ports, traffic generator, and basic forwarding path are working.
Test 2: five-tuple table + IP table
P4 logic:
if (five_tuple_tcam.apply().hit) {
return;
}
ip_table.apply();
forward_to_port2();
return;
Result: forwarding is normal.
This suggests that five_tuple_tcam and ip_table alone do not cause the packet loss.
Test 3: Add src_tcam_table.apply()
P4 logic:
if (five_tuple_tcam.apply().hit) {
return;
}
ip_table.apply();
src_tcam_table.apply();
forward_to_port2();
return;
Result: packet loss appears.
Test 4: Empty SRC TCAM table
I cleared src_tcam_table using BF Python and confirmed:
src_tcam_table Usage = 0 / Capacity = 5526
dump shows no entries
But the P4 pipeline still executes:
src_tcam_table.apply();
Result: packet loss still appears.
This suggests that the issue is probably not caused by SRC TCAM entry content, priority, mask, or action data.
Test 5: SRC TCAM hit action directly forwards
I also modified the SRC TCAM table so that the hit action directly forwards the packet:
table src_tcam_table {
key = {
ig_md.Group_id : exact;
ig_md.l4_src_port : ternary;
}
actions = {
forward_to_port2;
nop;
}
default_action = nop;
size = 5526;
}
The BF-RT control plane was also changed to install:
make_data([], "SwitchIngress.forward_to_port2")
Then I tested the following minimal path:
ip_table.apply();
if (src_tcam_table.apply().hit) {
return;
}
forward_to_port2();
return;
Result: similar packet loss still appears.
This suggests that the issue is probably not simply “SRC TCAM hit but no egress port was assigned”.
5. Directions already mostly excluded
Based on the above tests, the issue does not seem to be caused by:
-
Wrong SRC TCAM priority;
-
Wrong SRC TCAM entries;
-
No fallback forwarding after SRC TCAM miss;
-
P4 action and BF-RT action mismatch;
-
GID2 propagation to later stages;
-
DST TCAM / DST SRAM logic, because the minimal test can reproduce the issue before reaching the DST stage;
-
Output port 2/0 itself, because direct forwarding and five-tuple + IP forwarding work correctly.
The current symptom looks like this:
As soon as src_tcam_table.apply() is included in the pipeline,
a stable portion of packets is lost,
even if the table is empty.
6. What kind of modification would be appropriate?
I would like to know how to further debug or redesign this part.
In particular, should I try to modify the SRC lookup structure from one large mixed exact/ternary TCAM table into something like:
Small SRC exception TCAM:
key = Group_id exact + l4_src_port ternary
Large SRC default exact table:
key = Group_id exact
The reason is that in my generated SRC table, most entries are wildcard source-port entries, while only a small number are true exception entries with specific source ports. Therefore, using a large TCAM table for all SRC entries may be unnecessary.
I would also like to know whether the following debugging steps are reasonable:
1. Keep the original key, but reduce src_tcam_table size from 5526 or 12283 to 64.
2. Use only l4_src_port : ternary as the key and install one wildcard entry.
3. Use only Group_id : exact as the key.
4. Set Group_id = 0 manually before src_tcam_table.apply() to cut the dependency from ip_table to src_tcam_table.
5. Add counters before and after src_tcam_table.apply():
- ingress_start
- after_ip_table
- after_src_tcam
- before_final_forward
If before_final_forward is close to RX but port 2/0 TX is still lower, then I assume the issue may be in TM / egress / queue rather than P4 ingress logic.
Does this debugging plan make sense?
If this issue is related to the src_tcam_table key structure, table size, metadata dependency, or Tofino2 table placement, what P4 table structure would you recommend?
Thanks in advance for any suggestions or replies.
Full P4 program
#include <core.p4>
#if __TARGET_TOFINO__ == 2
#include <t2na.p4>
#else
#include <tna.p4>
#endif
// ---------------------------------------------------------------------------
// Headers
// ---------------------------------------------------------------------------
typedef bit<48> MacAddress;
typedef bit<32> IPv4Address;
#define GID2_WIDTH 14
typedef bit<GID2_WIDTH> gid2_t;
header ethernet_h {
MacAddress dst;
MacAddress src;
bit<16> etherType;
}
header ipv4_h {
bit<4> version;
bit<4> ihl;
bit<8> tos;
bit<16> len;
bit<16> id;
bit<3> flags;
bit<13> frag;
bit<8> ttl;
bit<8> proto;
bit<16> chksum;
IPv4Address src;
IPv4Address dst;
}
header tcp_h {
bit<16> sport;
bit<16> dport;
bit<32> seq;
bit<32> ack;
bit<4> dataofs;
bit<4> reserved;
bit<8> flags;
bit<16> window;
bit<16> chksum;
bit<16> urgptr;
}
header udp_h {
bit<16> sport;
bit<16> dport;
bit<16> len;
bit<16> chksum;
}
// ---------------------------------------------------------------------------
// struct
// ---------------------------------------------------------------------------
struct headers {
ethernet_h ethernet;
ipv4_h ipv4;
udp_h udp;
tcp_h tcp;
}
// user defined metadata
struct user_metadata_t {
// Stage 1: IP lookup result
bit<16> Group_id;
bit<16> Group_id_secondary;
// Stage 2: SRC port lookup
gid2_t Group_id2;
gid2_t src_gid2_from_tcam;
gid2_t src_gid2_from_sram;
// Stage 3: DST port lookup
bit<9> five_tuple_action;
bit<9> dst_action_from_tcam;
bit<9> dst_action_from_sram;
// Normalized L4 ports
bit<16> l4_src_port;
bit<16> l4_dst_port;
// Port quotient/remainder for SRAM bucket indexing
bit<11> src_quotient;
bit<11> dst_quotient;
bit<5> src_remainder;
bit<5> dst_remainder;
// SRAM bitmaps
bit<32> src_bitmap;
bit<32> dst_bitmap;
// Bucket hit vs bitmap hit distinction
bool src_sram_bucket_hit;
bool src_sram_bitmap_hit;
bit<1> src_bitmap_bit;
bool dst_sram_bucket_hit;
bool dst_sram_bitmap_hit;
bit<1> dst_bitmap_bit;
// Hit flags for path tracking
bool ip_table_hit;
bool src_tcam_hit;
bool src_sram_hit;
bool dst_tcam_hit;
bool dst_sram_hit;
bool five_tuple_hit;
// Debug: drop reason code
bit<8> drop_reason;
}
struct eg_metadata_t {
}
// ---------------------------------------------------------------------------
// Ingress Parser
// ---------------------------------------------------------------------------
parser SwitchIngressParser(
packet_in pkt,
out headers p,
out user_metadata_t ig_md,
out ingress_intrinsic_metadata_t ig_intr_md) {
state start {
pkt.extract(ig_intr_md);
pkt.advance(PORT_METADATA_SIZE);
// Initialize metadata
ig_md.Group_id = 0;
ig_md.Group_id_secondary = 0;
ig_md.Group_id2 = 0;
ig_md.five_tuple_action = 0;
ig_md.src_gid2_from_tcam = 0;
ig_md.src_gid2_from_sram = 0;
ig_md.dst_action_from_tcam = 0;
ig_md.dst_action_from_sram = 0;
ig_md.l4_src_port = 0;
ig_md.l4_dst_port = 0;
ig_md.src_quotient = 0;
ig_md.dst_quotient = 0;
ig_md.src_remainder = 0;
ig_md.dst_remainder = 0;
ig_md.src_bitmap = 0;
ig_md.dst_bitmap = 0;
ig_md.src_sram_bucket_hit = false;
ig_md.src_sram_bitmap_hit = false;
ig_md.src_bitmap_bit = 0;
ig_md.dst_sram_bucket_hit = false;
ig_md.dst_sram_bitmap_hit = false;
ig_md.dst_bitmap_bit = 0;
ig_md.ip_table_hit = false;
ig_md.src_tcam_hit = false;
ig_md.src_sram_hit = false;
ig_md.dst_tcam_hit = false;
ig_md.dst_sram_hit = false;
ig_md.five_tuple_hit = false;
ig_md.drop_reason = 0;
transition parse_ethernet;
}
state parse_ethernet {
pkt.extract(p.ethernet);
transition select(p.ethernet.etherType) {
0x800: parse_ip;
default: accept;
}
}
state parse_ip {
pkt.extract(p.ipv4);
transition select(p.ipv4.proto) {
6: parse_tcp;
17: parse_udp;
default: accept;
}
}
state parse_udp {
pkt.extract(p.udp);
transition select(p.udp.dport) {
default: accept;
}
}
state parse_tcp {
pkt.extract(p.tcp);
transition select(p.tcp.dport) {
default: accept;
}
}
}
// ---------------------------------------------------------------------------
// Ingress
// ---------------------------------------------------------------------------
control SwitchIngress(
inout headers p,
inout user_metadata_t ig_md,
in ingress_intrinsic_metadata_t ig_intr_md,
in ingress_intrinsic_metadata_from_parser_t ig_prsr_md,
inout ingress_intrinsic_metadata_for_deparser_t ig_dprsr_md,
inout ingress_intrinsic_metadata_for_tm_t ig_tm_md) {
const bool THROUGHPUT_TEST_MODE = true;
bit<16> vrf;
// ========== Common Actions ==========
action drop() {
ig_dprsr_md.drop_ctl = 0x1;
}
action forward(bit<9> port) {
ig_tm_md.ucast_egress_port = port;
}
action forward_to_port2() {
ig_tm_md.ucast_egress_port = 9w144;
}
action nop() {
}
// ========== Parallel Path: 5-Tuple TCAM ==========
action set_five_tuple_result(bit<9> egress_port) {
ig_md.five_tuple_action = egress_port;
ig_md.five_tuple_hit = true;
}
table five_tuple_tcam {
key = {
vrf : exact;
p.ipv4.src : ternary;
p.ipv4.dst : ternary;
p.ipv4.proto : exact;
ig_md.l4_src_port : ternary;
ig_md.l4_dst_port : ternary;
}
actions = {
forward_to_port2;
nop;
}
default_action = nop;
size = 1;
}
// ========== Stage-1: IP Table ==========
action set_gid1(bit<16> gid1) {
ig_md.Group_id = gid1;
ig_md.ip_table_hit = true;
}
table ip_table {
key = {
vrf : exact;
p.ipv4.src : ternary;
p.ipv4.dst : ternary;
p.ipv4.proto : exact;
}
actions = {
set_gid1;
nop;
}
default_action = nop;
size = 12283;
}
// ========== Stage-2: SRC Port Matching ==========
action set_src_gid2_from_tcam(gid2_t gid2) {
ig_md.src_gid2_from_tcam = gid2;
ig_md.src_tcam_hit = true;
}
table src_tcam_table {
key = {
ig_md.Group_id : exact;
ig_md.l4_src_port : ternary;
}
actions = {
set_src_gid2_from_tcam;
nop;
}
default_action = nop;
size = 12283;
}
action set_src_sram_bucket(bit<32> bitmap, gid2_t gid2) {
ig_md.src_bitmap = bitmap;
ig_md.src_gid2_from_sram = gid2;
ig_md.src_sram_bucket_hit = true;
}
table src_sram_table {
key = {
ig_md.Group_id : exact;
ig_md.src_quotient : exact;
}
actions = {
set_src_sram_bucket;
nop;
}
default_action = nop;
size = 1;
}
// ========== Stage-3: DST Port Matching ==========
action set_dst_action_from_tcam(bit<9> egress_port) {
ig_md.dst_action_from_tcam = egress_port;
ig_md.dst_tcam_hit = true;
}
table dst_tcam_table {
key = {
ig_md.Group_id2 : ternary;
ig_md.l4_dst_port : ternary;
}
actions = {
forward_to_port2;
nop;
}
default_action = nop;
size = 9149;
}
action set_dst_sram_bucket(bit<32> bitmap, bit<9> egress_port) {
ig_md.dst_bitmap = bitmap;
ig_md.dst_action_from_sram = egress_port;
ig_md.dst_sram_bucket_hit = true;
}
table dst_sram_table {
key = {
ig_md.Group_id2 : exact;
ig_md.dst_quotient : exact;
}
actions = {
forward_to_port2;
nop;
}
default_action = nop;
size = 5714;
}
// Bitmap selector actions and tables omitted here for readability in this post.
// They are included in my full source file and simply map remainder 0..31
// to the corresponding bitmap bit.
// ========== Apply Block ==========
apply {
vrf = 16w0;
if (p.tcp.isValid()) {
ig_md.l4_src_port = p.tcp.sport;
ig_md.l4_dst_port = p.tcp.dport;
} else if (p.udp.isValid()) {
ig_md.l4_src_port = p.udp.sport;
ig_md.l4_dst_port = p.udp.dport;
} else {
ig_md.l4_src_port = 16w0;
ig_md.l4_dst_port = 16w0;
}
ig_md.src_quotient = ig_md.l4_src_port[15:5];
ig_md.src_remainder = ig_md.l4_src_port[4:0];
ig_md.dst_quotient = ig_md.l4_dst_port[15:5];
ig_md.dst_remainder = ig_md.l4_dst_port[4:0];
if (five_tuple_tcam.apply().hit) {
ig_md.five_tuple_hit = true;
return;
}
if (!ip_table.apply().hit) {
ig_md.drop_reason = 8w1;
if (THROUGHPUT_TEST_MODE) {
forward_to_port2();
return;
} else {
ig_dprsr_md.drop_ctl = 0x1;
return;
}
}
src_tcam_table.apply();
src_sram_table.apply();
if (ig_md.src_sram_bucket_hit) {
src_bitmap_select_table.apply();
if (ig_md.src_bitmap_bit == 1) {
ig_md.src_sram_bitmap_hit = true;
}
}
bool src_match = false;
if (ig_md.src_tcam_hit) {
ig_md.Group_id2 = ig_md.src_gid2_from_tcam;
src_match = true;
} else if (ig_md.src_sram_bitmap_hit) {
ig_md.Group_id2 = ig_md.src_gid2_from_sram;
src_match = true;
}
if (!src_match) {
ig_md.drop_reason = 8w2;
if (THROUGHPUT_TEST_MODE) {
forward_to_port2();
return;
} else {
ig_dprsr_md.drop_ctl = 0x1;
return;
}
}
dst_tcam_table.apply();
dst_sram_table.apply();
if (ig_md.dst_sram_bucket_hit) {
dst_bitmap_select_table.apply();
if (ig_md.dst_bitmap_bit == 1) {
ig_md.dst_sram_bitmap_hit = true;
}
}
bool dst_match = false;
if (ig_md.dst_tcam_hit) {
forward_to_port2();
dst_match = true;
} else if (ig_md.dst_sram_bitmap_hit) {
forward_to_port2();
dst_match = true;
}
if (!dst_match) {
ig_md.drop_reason = 8w3;
if (THROUGHPUT_TEST_MODE) {
forward_to_port2();
return;
} else {
ig_dprsr_md.drop_ctl = 0x1;
return;
}
}
ig_md.drop_reason = 8w0;
}
}
// ---------------------------------------------------------------------------
// Ingress Deparser
// ---------------------------------------------------------------------------
control SwitchIngressDeparser(
packet_out pkt,
inout headers p,
in user_metadata_t ig_md,
in ingress_intrinsic_metadata_for_deparser_t ig_dprsr_md) {
apply {
pkt.emit(p);
}
}
// ---------------------------------------------------------------------------
// Egress Parser
// ---------------------------------------------------------------------------
parser SwitchEgressParser(
packet_in pkt,
out headers p,
out eg_metadata_t eg_md,
out egress_intrinsic_metadata_t eg_intr_md) {
state start {
pkt.extract(eg_intr_md);
transition parse_ethernet;
}
state parse_ethernet {
pkt.extract(p.ethernet);
transition select(p.ethernet.etherType) {
0x800: parse_ip;
default: accept;
}
}
state parse_ip {
pkt.extract(p.ipv4);
transition select(p.ipv4.proto) {
6: parse_tcp;
17: parse_udp;
default: accept;
}
}
state parse_tcp {
pkt.extract(p.tcp);
transition accept;
}
state parse_udp {
pkt.extract(p.udp);
transition accept;
}
}
// ---------------------------------------------------------------------------
// Egress
// ---------------------------------------------------------------------------
control SwitchEgress(
inout headers p,
inout eg_metadata_t meta,
in egress_intrinsic_metadata_t eg_intr_md,
in egress_intrinsic_metadata_from_parser_t eg_prsr_md,
inout egress_intrinsic_metadata_for_deparser_t eg_dprsr_md,
inout egress_intrinsic_metadata_for_output_port_t eg_oport_md) {
apply {
}
}
// ---------------------------------------------------------------------------
// Egress Deparser
// ---------------------------------------------------------------------------
control SwitchEgressDeparser(
packet_out pkt,
inout headers p,
in eg_metadata_t meta,
in egress_intrinsic_metadata_for_deparser_t eg_dprsr_md)
{
apply {
pkt.emit(p);
}
}
// ---------------------------------------------------------------------------
// Pipeline
// ---------------------------------------------------------------------------
Pipeline(SwitchIngressParser(),
SwitchIngress(),
SwitchIngressDeparser(),
SwitchEgressParser(),
SwitchEgress(),
SwitchEgressDeparser()) pipe;
Switch(pipe) main;