Throughput measurement in Tofino

Hi,

I’m currently working on evaluating the throughput of a Tofino-based switch and have a couple of questions regarding measurement approaches. My topology is shown below.

1. Built-in throughput measurement in Tofino

As far as I understand, Tofino provides intrinsic timestamping capabilities in the data plane. However, I’m not sure whether there is any built-in mechanism in the data plane to directly measure throughput (e.g., bytes/sec or packets/sec), or if the only viable approach is to derive throughput indirectly.

For example:

  • Using ingress/egress counters exposed to the control plane

  • Or computing throughput based on timestamp differences

Is there any recommended or standard way to measure throughput directly on Tofino?


2. Feasibility of timestamp-based measurement at scale

If we rely on timestamps to compute throughput (e.g., tracking per-packet timestamps and computing differences), scalability becomes a concern.

In high-speed scenarios, the number of packets can easily reach hundreds of millions. Recording timestamps for each packet seems impractical due to:

  • Limited on-chip memory

  • Bandwidth constraints for exporting telemetry data

So my questions are:

  • Is per-packet timestamping a practical approach for throughput measurement on Tofino?

  • Are there common design patterns (e.g., sampling, aggregation, sketching) to handle this at scale?

  • Would using counters be the preferred approach instead?


Any insights, best practices, or references would be greatly appreciated.

Thanks!

Dear @1418915702,

First of all, I think it’s important to define what exactly is it that you want to measure.

The topology you have drawn shows some DPDK server sending a packet into Tofino that seems to disappear right inside it. :smile:

Here are some quick thoughts:

  1. Using timestamps to measure throughput is almost always is not a good idea. They are used to measure processing delay (processing latency)
  2. You almost certainly will not be able to measure Tofino performance using a server with a network card, because Tofino is a much more performance system.
  3. Overall Tofino is spec-ed at 6.4Tbps (at 140 average frame size) and in most cases is performance can be accurately predicted.

Happy hacking,
Vladimir

1 Like

Hi Vladimir,

Thanks a lot for your comments.

I’d like to clarify that my goal is not to measure the maximum throughput of the Tofino switch itself. Instead, I have designed a P4-based staged pipeline processing architecture, and I am interested in evaluating the throughput characteristics of this specific design.

Regarding the measurement methodology, I have updated my approach. I am now using DPDK-based servers on both the transmitting (TX) and receiving (RX) sides to measure the throughput of the designed P4 pipeline under different packet sizes. The focus is on understanding how the pipeline behaves under varying traffic profiles rather than pushing the hardware to its absolute limits.

I appreciate your insights on measurement techniques and system limitations — they are very useful in refining the evaluation setup.

Best regards

Dear @1418915702 ,

For as long as you have a program where a packet simply traverses the pipeline from ingress to egress (no recirculation, resubmit, mirroring, etc.), that program should process packets at line rate, assuming that:

  1. Parsing stays within the parser budget (meaning that parsing an individual packet does not exceed the time required to receive it)
  2. The average packet size (per pipe) stays below 140 bytes (meaning that the total number of packets received on 16x100Gbps ports connected to that pipe does not exceed 1.22bn).

If you use a single ingress and egress port, the second condition is automatically satisfied, so you should see 100Gbps throughput for all frame sizes that satisfy parser budget.

The only thing that will change is the latency, but P4i tool already calculates the variable portion of it.

I do not know what do you mean exactly by “traffic proflles”. If by that you mean different mixtures of packets, please see my explanation above. If by “traffic profiles” you mean “various kinds of congestion modes”, sure you can measure those, but the results will depend not on the specific P4 program, but rather on the congestion conditions and how they are going to be handled by the Tofino Traffic Manager. In general, you should see line rate performance for as long as the bursts, caused by congestion, can be absorbed by the traffic manager.

To summarize – Tofino is a very deterministic device. When you see drops in its throughtput, it doesn’t mean that it suddenly started processing packets slower. Instead it means that somewhere congestion has been created and the device starts dropping some packets not being able to absorb such a congestion otherwise.

Happy hacking,
Vladimir

1 Like