How to perform large-scale delay and throughput packet-sending test for P4

I’m deploying a publish subscribe system to a mininet topology, and I’d like to have every host involved in the topology send packets, and I want to have 500+ throughput on the sender side.
I have tried to use sendpfast() instead of sendp() , but the throughput on the send side is still not as expected. And I still need to open all hosts in mininet with xterm command.
Is there any other way to send packets on a large scale and is there a way to make hosts send packets without opening all hosts with xterm command in mininet?