Claude Opus 4.1 is an updated version of Anthropic's flagship model released in August 2025, offering improved performance in coding, reasoning, and agentic tasks. It achieves 74.5% on SWE-bench Verified and shows notable gains in multi-file code refactoring, debugging precision, and detail-oriented reasoning. The model supports extended thinking up to 64K tokens and is optimized for tasks involving research, data analysis, and tool-assisted reasoning workflows.
API|VisionReasoningWeb SearchFile|Proprietary Model
Knowledge Cutoff
2025-01-31
Input → Output Format
Context Memory
200KIN32KOUT
AI Performance Evaluation
Arena Overall Score
1449
±4As of 2026-04-23
Overall Rank
No.37
49,864 Votes
Arena by Ability
Hard Prompts
1480±5No.27
Expert Knowledge
1482±12No.29
Instruction Following
1459±6No.17
Conversation Memory
1473±7No.24
Creative
1445±8No.22
Coding
1512±7No.21
Math
1443±11No.38
Arena by Occupation
Creative Writing
1444±6No.25
Social Sciences
1471±7No.31
Media
1433±7No.26
Business
1448±7No.36
Healthcare
1478±12No.28
Legal
1463±11No.31
Software
1492±5No.30
Mathematics
1449±12No.40
Source:Arena Intelligence
Overall
AA Intelligence Index
42%↑4%
LiveBench
61%↑1%
ForecastBench
60%↑1%
Reasoning & Math
AA Math Index
80%↑7%
GPQA Diamond
81%↑0%
HLE
12%↓5%
MMLU-Pro
88%↑6%
AIME 2025
80%↑7%
LB Reasoning
72%↑13%
LB Math
73%↑0%
LB Data
49%↓1%
Coding
AA Coding Index
37%↑2%
LiveCodeBench
65%↑0%
LB Coding
75%↑1%
LB Agentic
48%↑5%
TAU2
71%↓2%
TerminalBench
34%↑3%
SciCode
41%↑0%
Language & Instructions
IFBench
55%↓1%
AA-LCR
66%↑5%
Hallucination (HHEM)
12%↑2%
Factual (HHEM)
88%↓2%
LB Language
73%↑1%
LB IF
42%↓4%
Output Speed
Standard Mode
34tok/s↓48
First Output 1.33s
Reasoning Mode
45tok/s↓43
First Output 9.10s