Hello,
I am studying and trying to reproduce the ALPM (Algorithmic Longest Prefix Match) example in P4 as a baseline algorithm for my research.
While reading the ALPM patent, I encountered a question about how the SRAM lookup stage (the second stage) actually works in hardware.
Background
According to the patent, the ALPM lookup consists of two stages:
-
TCAM stage (or ATCAM layer)
- Used to determine which SRAM partition should be accessed based on the high bits of the prefix.
-
SRAM stage (ALPM layer)
- Performs the actual longest prefix match within the selected partition.
My question is about the second stage:
How is the lookup performed inside an SRAM partition within a single pipeline stage?
Reasoning
It should not be a linear or sequential search, because:
-
If the partition contained hundreds of entries and the lookup were sequential, it would require hundreds of pipeline stages to compare them all, which is unrealistic.
-
Even in the example shown in the patent, if the process were sequential, the partition would need to be replicated multiple times (e.g., eight times for eight prefixes), which clearly doesn’t match how hardware is designed.
However, in the P4 source code provided by Intel, I only see the following definitions:
Alpm(number_partitions = 1024, subtrees_per_partition = 2) alpm;
Alpm(number_partitions = 1024, subtrees_per_partition = 2,
atcam_subset_width = 18, shift_granularity = 1) alpm_ipv4_subset_key;
Alpm(number_partitions = 1024, subtrees_per_partition = 2,
atcam_subset_width = 32, shift_granularity = 2) alpm_ipv6_subset_key;
My main questions
1.How is the lookup inside each SRAM partition actually implemented?
2.In hardware, how is the SRAM lookup completed within one pipeline stage?
I would really appreciate any clarification or insight about how ALPM performs the SRAM-stage lookup internally.
Any explanation or reference would be extremely helpful.
