Grok
Grok

Grok 4.20 (Reasoning)

2026-03-31

Grok 4.20 (Reasoning) is the reasoning-enabled configuration of xAI's Grok 4.20, utilizing extended internal thinking to work through problems before presenting answers. Combined with the model's native multi-agent architecture and cross-agent verification, it delivers the highest accuracy in the Grok lineup on tasks requiring deep logic, mathematical reasoning, and complex multi-step problem solving. It supports the same 2M-token context window, strict prompt adherence, and the industry's lowest hallucination rate among its class.

Grok SuperGrok HeavyAPI|VisionReasoningWeb SearchFile|Proprietary Model
Knowledge Cutoff
Unknown
Input → Output Format
Context Memory
2MIN2MOUT
Cost/1M Words
$2IN$6OUT
Calculate Cost

AI Performance Evaluation

Arena Overall Score
1482
±6
As of 2026-04-23
Overall Rank
No.8
14,620 Votes
Arena by Ability
Hard Prompts
1495±7No.14
Expert Knowledge
1469±18No.39
Instruction Following
1455±9No.21
Conversation Memory
1491±13No.10
Creative
1467±13No.8
Coding
1513±10No.20
Math
1457±18No.25
Arena by Occupation
Creative Writing
1458±11No.12
Social Sciences
1487±12No.13
Media
1457±12No.9
Business
1472±12No.13
Healthcare
1517±19No.4
Legal
1502±18No.6
Software
1511±8No.13
Mathematics
1459±22No.30
Overall
AA Intelligence Index
49%↑11%
LiveBench
69%↑9%
Reasoning & Math
GPQA Diamond
91%↑10%
HLE
32%↑15%
LB Reasoning
75%↑16%
LB Math
87%↑14%
LB Data
63%↑13%
Coding
AA Coding Index
41%↑6%
LB Coding
66%↓8%
LB Agentic
43%↑0%
TAU2
93%↑20%
TerminalBench
38%↑7%
SciCode
46%↑5%
Language & Instructions
IFBench
81%↑24%
AA-LCR
58%↓4%
LB Language
78%↑6%
LB IF
63%↑17%
Output Speed
Standard Mode
113tok/s↑31
First Output 0.42s
Reasoning Mode
110tok/s↑22
First Output 27.83s