AI Model Benchmarks

Comparison and analysis of AI models across key performance metrics including quality, speed, latency and cost.

Coding agents

Terminal Bench 2.0

pass rate (%)

6 models

End-to-end coding agent benchmark. All models self-hosted via vLLM on NVIDIA A100 GPUs. Each model is paired with a coding framework (Claude Code, OpenCode) and evaluated on Terminal Bench 2.0.

50403020100

34.3

31.5

21.0

11.1

10.4

opencode

opencode

claude code

claude code

claude code

opencode

Higher is better · Source: Kanon benchmark runs

Text-to-Speech

Real-time factor p50

audio-sec ÷ wall-sec, higher is faster

4 models

TTS inference speed via Real-Time Factor (RTF) at p50. All models self-hosted via vLLM on NVIDIA A100 GPUs. Higher RTF means faster-than-realtime synthesis.

6543210

5.33×

4.02×

3.52×

1.70×

k2-fsa

openbmb

qwen

fish audio

Higher is better · Source: Kanon benchmark runs

Speech-to-Text

LibriSpeech (mini) WER

word error rate, lower is better

2 models

Speech recognition accuracy via Word Error Rate (WER) on LibriSpeech mini subset. All models self-hosted via vLLM on NVIDIA A100 GPUs. Lower is better.

50403020100

39.8%

whisper-large-v3

openai

qwen3-asr-1.7b

qwen

Lower is better · Source: Kanon benchmark runs