Hi all, I’m fairly new to P4 and still wrapping my head around certain parts. I understand that the general gist is that the P4 compiler will try to take advantage of any possible parallelization whenever possible. For example, if multiple lookups can be parallelized within one Match Action Unit, then the P4 compiler will take care of that.
I was wondering what typically happens when there is only one lookup table (either TCAM or SRAM) that is very large. For example, to keep it vendor anonymous, if there are x MAUs and y TCAM blocks per MAU according to a vendor design, what happens when I create one P4 lookup table that only uses lpm and I fill it up with more entries than can be stored in y TCAM blocks.
Is that even allowed? Will the P4 compiler somehow “split” the one logical TCAM lookup table across multiple MAUs and parallelize it somehow? In that case, how will the results be unified and decided once all the separate lookups have been completed? Lastly, is the pipeline strictly linear or can the input from the parser directly travel to MAU 2 for example, and do a lookup there without going through MAU 1 and MAU 0?
Thank you! Any guidance will be appreciated.