LiveBench 에이전틱 코딩 카테고리 점수 (0~100). 자율 코드 생성·수정·테스트 등 멀티스텝 코딩 능력을 평가합니다.
OpenAI
GPT-5.4
Google
Gemini 3.1 Pro
Anthropic
Claude Sonnet 4.6
Claude Opus 4.6
Claude Opus 4.7
Moonshot AI
Kimi K2.6
GPT-5.5
Z.ai
GLM-5
GLM-5.1
Alibaba
Qwen3.6 Plus
GPT-5
MiniMax
MiniMax M2.5
Claude Opus 4.5
MiniMax M2.7
Claude Opus 4.1
Claude Sonnet 4.5
Kimi K2.5
DeepSeek
DeepSeek V3.2
Gemini 3 Flash
Grok
Grok 4.20 (Reasoning)
Claude Sonnet 4
Gemma 4 31B
Grok 4.20
GPT-5 Mini
Claude Haiku 4.5
Gemini 2.5 Pro
Gemini 3.1 Flash Lite
Grok 4.1 Fast (Reasoning)
Xiaomi
MiMo-V2-Pro
GPT-5 Nano
GPT-5.4 Nano
NVIDIA
Nemotron 3 Super
GPT-5.4 Mini
Gemini 2.5 Flash
GPT OSS 120B
Grok 4.1 Fast
Gemini 2.5 Flash Lite
Arcee AI
Trinity Large Thinking