AI Model Benchmarks
Comparison and analysis of AI models across key performance metrics including quality, speed, latency and cost.
Coding agents
Terminal Bench 2.0
pass rate (%)
End-to-end coding agent benchmark. All models self-hosted via vLLM on NVIDIA A100 GPUs. Each model is paired with a coding framework (Claude Code, OpenCode) and evaluated on Terminal Bench 2.0.
opencode
opencode
claude code
claude code
claude code
opencode
Higher is better · Source: Kanon benchmark runs
Text-to-Speech
Real-time factor p50
audio-sec ÷ wall-sec, higher is faster
TTS inference speed via Real-Time Factor (RTF) at p50. All models self-hosted via vLLM on NVIDIA A100 GPUs. Higher RTF means faster-than-realtime synthesis.
Higher is better · Source: Kanon benchmark runs
Speech-to-Text
LibriSpeech (mini) WER
word error rate, lower is better
Speech recognition accuracy via Word Error Rate (WER) on LibriSpeech mini subset. All models self-hosted via vLLM on NVIDIA A100 GPUs. Lower is better.
openai
qwen
Lower is better · Source: Kanon benchmark runs