Difference between P4 architectures - PISA vs PSA

PSA was developed and published later than PISA, and had as one of its goals to be precise, and include most or all of the details visible to a P4 developer that you would want in order to answer questions like “what does multicast replication do, and what packets show up in egress as a result?” and “what does cloning/mirroring do, and what packets show up in egress as a result, with all of its headers and contents, down to the last bit?”

If I recall correctly (I haven’t re-read the source you mention by Hauser et al recently), PISA, by contrast, is described at a higher level, with fewer details, and is not sufficiently detailed to answer those kinds of questions. This is not a criticism of PISA – it was written with a different purpose in mind. Those details would have been a distraction from the purpose of the paper.

TNA was initially developed as a proprietary architecture by Barefoot and later Intel (after acquiring Barefoot), with its details very well fleshed out, but only given to people who had signed an NDA (Non-Disclosure Agreement), and not published in more detail until 2021, here: Open-Tofino/PUBLIC_Tofino-Native-Arch.pdf at master · barefootnetworks/Open-Tofino · GitHub

The Tofino ASICs always had a hardware deparser before the traffic manager, but the P4_14 language plus architecture did its best to hide its existence from the P4 developer, with the P4_14 compiler for Tofino configuring the ingress deparser hardware so that the P4 developer often would not need to know of its existence.