Claude Sonnet 4 is Anthropic's balanced mid-tier model released alongside Opus 4 in May 2025, designed to combine strong coding and reasoning capabilities with computational efficiency. It achieves state-of-the-art 72.7% on SWE-bench while offering significantly lower cost and faster response times than Opus models. Key strengths include autonomous codebase navigation, reduced error rates in agent-driven workflows, and high reliability in following intricate instructions, making it a versatile choice for both routine and complex development tasks.
API|VisionReasoningWeb SearchFile|Proprietary Model
Knowledge Cutoff
2025-01-31
Input → Output Format
Context Memory
1MIN64KOUT
AI Performance Evaluation
Arena Overall Score
1399
±4As of 2026-04-23
Overall Rank
No.102
35,153 Votes
Arena by Ability
Hard Prompts
1430±6No.85
Expert Knowledge
1433±15No.79
Instruction Following
1414±7No.68
Conversation Memory
1420±8No.74
Creative
1395±9No.64
Coding
1472±8No.60
Math
1402±13No.97
Arena by Occupation
Creative Writing
1397±7No.77
Social Sciences
1418±8No.97
Media
1389±8No.76
Business
1384±8No.117
Healthcare
1419±13No.106
Legal
1410±13No.96
Software
1443±6No.86
Mathematics
1410±13No.98
Source:Arena Intelligence
Overall
AA Intelligence Index
39%↑0%
LiveBench
61%↑0%
ForecastBench
59%↑0%
Reasoning & Math
AA Math Index
74%↑1%
GPQA Diamond
78%↓3%
HLE
9.6%↓8%
MMLU-Pro
84%↑2%
AIME 2025
74%↑1%
MATH-500
99%↑6%
LB Reasoning
69%↑9%
LB Math
71%↓3%
LB Data
55%↑5%
Coding
AA Coding Index
34%↑0%
LiveCodeBench
66%↑0%
LB Coding
77%↑4%
LB Agentic
40%↓3%
TAU2
65%↓9%
TerminalBench
31%↑0%
SciCode
40%↓1%
Language & Instructions
IFBench
55%↓2%
AA-LCR
65%↑3%
Hallucination (HHEM)
10%↑0%
Factual (HHEM)
90%↑0%
LB Language
73%↑1%
LB IF
44%↓2%
Output Speed
Standard Mode
45tok/s↓37
First Output 0.80s
Reasoning Mode
63tok/s↓25
First Output 9.28s