Grok 4.20 is xAI's newest flagship model released in February 2026, introducing a native 4-agent multi-agent architecture where specialized AI agents collaborate simultaneously on complex queries. It maintains a 2M-token context window — the largest among Western frontier models — and achieves a 65% reduction in hallucination rates through cross-agent verification. The model updates its capabilities weekly based on real-world usage and delivers fast direct answers at 232 tokens per second with 0.54-second time-to-first-token.
Grok SuperGrok HeavyAPI|Proprietary Model
Knowledge Cutoff
Unknown
Input → Output Format
Context Memory
2MIN2MOUT
AI Performance Evaluation
Arena Overall Score
1482
±6As of 2026-04-23
Overall Rank
No.8
14,620 Votes
Arena by Ability
Hard Prompts
1495±7No.14
Expert Knowledge
1469±18No.39
Instruction Following
1455±9No.21
Conversation Memory
1491±13No.10
Creative
1467±13No.8
Coding
1513±10No.20
Math
1457±18No.25
Arena by Occupation
Creative Writing
1458±11No.12
Social Sciences
1487±12No.13
Media
1457±12No.9
Business
1472±12No.13
Healthcare
1517±19No.4
Legal
1502±18No.6
Software
1511±8No.13
Mathematics
1459±22No.30
Source:Arena Intelligence
Overall
AA Intelligence Index
29%↓9%
LiveBench
38%↓22%
ForecastBench
62%↑3%
Reasoning & Math
GPQA Diamond
78%↓3%
HLE
24%↑7%
LB Reasoning
26%↓34%
LB Math
46%↓28%
LB Data
43%↓6%
Coding
AA Coding Index
22%↓12%
LB Coding
59%↓15%
LB Agentic
38%↓5%
TAU2
60%↓13%
TerminalBench
17%↓14%
SciCode
33%↓8%
Language & Instructions
IFBench
49%↓7%
AA-LCR
17%↓44%
LB Language
42%↓30%
LB IF
24%↓22%
Output Speed
Standard Mode
107tok/s↑25
First Output 0.43s
Reasoning Mode
248tok/s↑160
First Output 11.74s