Can Open P4 Studio display memory utilization for table entries, and how can I check it?

I’m working with a P4 programmable switch (Intel Tofino-based) and using Open P4 Studio (SDE). I need to monitor the memory usage of my match-action tables, specifically the TCAM and SRAM utilization at the physical memory level, not just the number of entries installed.

I am aware that I can check the current number of entries via bfshell or bfrt commands like pm show table, but that only gives the logical occupancy (entries used vs. table size). What I really want is to see how much of the underlying TCAM and SRAM hardware blocks are actually occupied – for example, the percentage of TCAM banks used or SRAM slices consumed across different pipes.

I’ve tried using pm resource usage in ucli, which provides some high-level resource usage per pipe, but I’m not entirely sure if that reflects the real memory consumption of table entries, especially when tables have different widths or when multiple tables share the same memory pool.

My questions are:

  1. Does Open P4 Studio (or the underlying BfRt/bfshell) provide a way to query the actual physical memory occupancy (TCAM/SRAM) by table entries?

  2. If yes, what are the exact commands or APIs to retrieve this information?

  3. Are there any differences between viewing per-table memory usage vs. overall chip memory usage?

  4. Could anyone share examples or scripts that can help monitor these metrics dynamically?

Thanks in advance!

Dear @1418915702,

The number of SRAM and TCAM blocks (as well as other resources) occupied by each table is determined during the program compilation and does not change afterwards. If these are the numbers you are after, the information can be easily obtained.

If you have access to the original Intel P4 Studio SDE (the latest version being SDE-9.13.4), it includes the tool, called P4Insight (p4i) that shows this in great detail (as well as a lot of other useful information).

If you have access to Open-P4Studio only, you will need to rely on the compiler log files, that contain most of this information, either in text or in JSON format.

Make sure you compile your program using -g command line parameter and then look inside the output directory (myprog.tofino). you will see a directory named pipe/logs that contain all the compiler log files. Note, that the name pipe is actually the name of your Pipeline() package instance, so if you named it differently, then adjust the name accordingly.

Here is what you’ll see:

$ bf-p4c -g myprog.p4
$ ls myprog.tofino/pipe/logs/
flexible_packing.log     phv_allocation_3.log          resources_deparser.json
mau.characterize.log     phv_allocation_6.log          table_dependency_graph.log
mau.json                 phv_allocation_history_0.log  table_dependency_summary.log
mau.resources.log        phv_allocation_history_3.log  table_placement_1.log
metrics.json             phv_allocation_history_6.log  table_placement_2.log
pa.characterize.log      phv_allocation_summary_0.log  table_placement_4.log
pa.results.log           phv_allocation_summary_3.log  table_placement_5.log
parser.characterize.log  phv_allocation_summary_6.log  table_placement_7.log
parser.log               power.json                    table_summary.log
phv.json                 pragmas.log
phv_allocation_0.log     resources.json

The main files that contain the information you are looking for are:

  • mau.characterize.log (text version) / mau.json (machine-readable JSON)
  • mau.resources.log

Please note that this information is static: when entries are placed into the tables at run time, the resources for those tables have already been allocated and thus the entries do not occupy any additional resources beyond that. The actual placement (physical addresses) of the individual entries is transparent, depends on a lot of circumstances, and there is no easy way to get it. Most importantly, it should not matter.

Just to re-iterate, you do not need to actually run the program to see this information – you only need to compile it.

Happy hacking,
Vladimir

PS: You might consider making your handle a little more human-readable, so that we can address you by name and not by some random number :slight_smile:

1 Like

Hi Vladimir,

Thanks for the detailed explanation — that helped a lot.

I actually do have access to P4Insight (p4i), but I’m still a bit unsure how to interpret the memory usage there.

In the Match Table view, I can see the key width (in bits) and the number of entries. My current understanding is:

  • If I multiply “bit width × number of entries”, I get the logical memory requirement.

However, I’m not sure whether this corresponds to the actual physical memory usage (TCAM/SRAM blocks), since I assume the hardware allocates memory in fixed-size blocks with alignment and packing constraints.

So my question is:

:backhand_index_pointing_right: Should I instead look at the MAU / resource allocation view (e.g., TCAM blocks, SRAM blocks per table) to understand the real hardware memory consumption?

Or is there a more direct way in p4i to map a table to its exact physical memory footprint?

Thanks again for your help!

Dear @1418915702 (Wonx3),

To properly interpret memory usage you need to have at least some understanding of Tofino hardware. This will help you to understand why tables are mapped onto its hardware resources in a certain way. For this, I recommend you take some of the courses of Intel Connectivity Academy by P4ica, such as ICA-XFG-101, ICA-1141 and ICA-1142.

Suffice it to say that the resources are allocated not with bit or entry precision, but in certain “quanta”.

For example, SRAM blocks in Tofino contain 1024 128-bit words, while TCAM blocks contain 512 44-trit (bit-wide) words. Hence, all tables that are placed in SRAM will have the number of entries that is a multiple of 1024 (at least). Even if you ask for 100 entries, you will still get at least 1024. In reality, exact match tables are implemented as multi-way hash tables, and each bucket in each of the W ways can contain N entries (N=1..8). Thus, the minimum table size will be W*N*1024 entries. The compiler examines a lot of combinations to find an optimal one. This explains the difference between the number of requested and allocated entries.

Another example is the entry packing itself. An n-bit-wide key can occupy anywhere from 0 (in case we can use the so called Action Data (indexed) table) to n*8 bits depending on tons of factors, starting from the table type and the specific allocation of key fields in the PHV (n*8 is a very extreme example that does not happen except for very specifically written cases). An m-bit-wide data field can also occupy anywhere between m and m*8 bits. Some constants used inside actions can also occupy memory. This explains the difference between the theoretical entry width (bits per entry) and the actual one.

In summary, it is impossible to predict the details of the actual allocation and hence the exact resource consumption unless you compile the program. P4i can show you the results and with good understanding of Tofino hardware and allocation strategies these results can be explained and reasoned about. That’s all.

Happy hacking,
Vladimir