LiveBench Instruction Following category score (0–100). Evaluates the ability to precisely follow complex multi-constraint instructions.
Google
Gemini 3.1 Pro
OpenAI
GPT-5.5
GPT-5.4
Gemini 3.1 Flash Lite
Z.ai
GLM-5.1
Gemma 4 31B
Moonshot AI
Kimi K2.6
GPT-5 Mini
GPT-5
Grok
Grok 4.20 (Reasoning)
Anthropic
Claude Opus 4.6
MiniMax
MiniMax M2.7
Alibaba
Qwen3.6 Plus
Kimi K2.5
MiniMax M2.5
GLM-5
GPT-5 Nano
Claude Sonnet 4.6
GPT OSS 120B
Claude Opus 4.7
Claude Sonnet 4
Xiaomi
MiMo-V2-Pro
Claude Opus 4.1
Gemini 2.5 Pro
Claude Opus 4.5
Gemini 2.5 Flash
NVIDIA
Nemotron 3 Super
Gemini 3 Flash
Grok 4.1 Fast (Reasoning)
Grok 4.20
Claude Sonnet 4.5
Gemini 2.5 Flash Lite
DeepSeek
DeepSeek V3.2
GPT-5.4 Mini
Claude Haiku 4.5
Grok 4.1 Fast
GPT-5.4 Nano
Arcee AI
Trinity Large Thinking