AI Model Benchmarks

Comparison and analysis of AI models across key performance metrics including quality, speed, latency and cost.

Coding agents

Terminal Bench 2.0

pass rate (%)

6 models

End-to-end coding agent benchmark. All models self-hosted via vLLM on NVIDIA A100 GPUs. Each model is paired with a coding framework (Claude Code, OpenCode) and evaluated on Terminal Bench 2.0.

50403020100
34.3
31.5
21.0
21.0
11.1
10.4
MiniMax
minimax-m2.7

opencode

MiniMax
minimax-m2.7

claude code

Qwen
qwen3.6-35b-a3b

claude code

Qwen
qwen3.5-35b-a3b

claude code

Higher is better · Source: Kanon benchmark runs

Text-to-Speech

Real-time factor p50

audio-sec ÷ wall-sec, higher is faster

4 models

TTS inference speed via Real-Time Factor (RTF) at p50. All models self-hosted via vLLM on NVIDIA A100 GPUs. Higher RTF means faster-than-realtime synthesis.

6543210
5.33×
4.02×
3.52×
1.70×
k2-fsa
omnivoice

k2-fsa

OpenBMB
voxcpm2

openbmb

Fish Audio
s2-pro

fish audio

Higher is better · Source: Kanon benchmark runs

Speech-to-Text

LibriSpeech (mini) WER

word error rate, lower is better

2 models

Speech recognition accuracy via Word Error Rate (WER) on LibriSpeech mini subset. All models self-hosted via vLLM on NVIDIA A100 GPUs. Lower is better.

50403020100
39.8%
39.8%

Lower is better · Source: Kanon benchmark runs