Note: Some of the P4 limitations I describe here are not necessarily limitations of the P4 language as written in its specification. However, they are limits in most P4 implementations, so whether the language definition limits these things doesn’t matter much when it comes to actually running P4 programs on a low cost-per-Gbps-of-throughput device.
In most existing P4 architectures, you must do parsing first, and complete it, before starting to do any table lookups. You can still go back to parsing after doing table lookups using operations that many P4 architectures provide, e.g. resubmit or recirculate operations in the v1model, PSA, and TNA switch architectures, but this usually has a noticeable and undesirable performance cost.
In EBPF programs, any parsing can be mingled freely with other code, including code that does lookups on EBPF maps (which have some similarities with P4 tables).
In EBPF programs, as long as you do the necessary locking to make it safe in a multi-CPU-core system, you can do a lookup on a map, and modify the corresponding value in pretty much arbitrary ways.
In many P4 implementations, table contents, both keys and values, are read-only from the P4 program, and can only be modified from the control plane software. There are P4 implementations with DirectCounter, DirectMeter, and DirectRegister externs that can add modifiable state to every entry of a table, but for DirectCounter and DirectMeter, that additional per-table-entry state can only be modified in very limited ways as defined by the methods of those externs. The DirectRegister extern in TNA is closer to the generality that EBPF permits, but at least in the TNA implementation the modifications you are allowed only to make modifications that can be finished within one Tofino ASIC clock cycle, which is fairly limiting.
EBPF enables you to access the same map as many times as you want. Most P4 implementations limit you to accessing each P4 table at most once for each time the packet is processed in one “pass” (i.e. once when the packet is first processed, and you can do it again only if you use one of the operations like resubmit or recirculate mentioned above).
Don’t get me wrong – when it comes to a general purpose user space C program running on Linux, both P4 and EBPF look quite similar in many of their limitations. I only noticed the differences above from spending lots of time with both P4 and EBPF. EBPF is a bit more general in what it currently supports than most P4 implementations are. Performing a mechanical translation via compiler from P4 to EBPF is much much easier than attempting to do so in the opposite direction, because of that extra generality that EBPF allows.