Guides

How to Benchmark Local AI Performance Properly

Measure tokens per second, first-token latency, warm starts, and real workload behaviour before upgrading.

Robson PereiraMay 25, 20268 min read
Performance testing dashboard for a local AI server.

How to Benchmark Local AI Performance Properly

Good benchmarking is less about chasing one impressive number and more about learning how the system behaves under the same conditions every time. If you measure carefully, you can tell whether a new GPU, a larger model, or a different runtime actually helps.

Measure the right things

Track first-token latency, tokens per second, memory pressure, and warm versus cold start behaviour. Those numbers tell you more about daily use than a single synthetic score.

Pair this with Best Hardware for Self-Hosted AI so the numbers can influence hardware planning.

Keep the prompt set fixed

Use the same small set of prompts every time. Include a short factual question, a longer summarisation request, and one task that stresses context or reasoning. If the prompts change, the benchmark becomes noise.

Test under load too

One request is not the same as three users and a background indexing job. If you want to know whether the system is usable, simulate the real mix of work it will face.

Compare before and after changes

Benchmark the current setup, make one change, then run the same tests again. This is far more useful than stacking multiple upgrades and trying to guess which one helped.

Read Proxmox Setup for AI Workloads if your benchmark must account for virtualisation, storage layers, or GPU passthrough.

Conclusion

The best benchmark is repeatable, boring, and honest. It tells you whether the build got better in the ways that matter to you, not just on a chart.

Related articles