Grok

Grok 4.20 (Reasoning)

Name: Grok Grok 4.20 (Reasoning)
Author: Grok

Compare

Model ID:grok-4.20-0309-reasoning

2026-03-31

Compare

Grok 4.20 (Reasoning) is the reasoning-enabled configuration of xAI's Grok 4.20, utilizing extended internal thinking to work through problems before presenting answers. Combined with the model's native multi-agent architecture and cross-agent verification, it delivers the highest accuracy in the Grok lineup on tasks requiring deep logic, mathematical reasoning, and complex multi-step problem solving. It supports the same 2M-token context window, strict prompt adherence, and the industry's lowest hallucination rate among its class.

Grok SuperGrok HeavyAPI|VisionReasoningWeb SearchFile|Proprietary Model

Knowledge Cutoff

Unknown

The date this AI finished learning. It may not know about things that happened after this date.

Input → Output Format

The types of content this AI can receive, and what it can produce in return.

Context Memory

2MIN2MOUT

The maximum amount of text the AI can read and process in a single request. A larger number means it can handle longer documents or conversations.

Cost/1M Words

$2IN$6OUT

The cost of using this AI directly in your own application. Shown in USD per 1 million units of text (tokens).

Calculate Cost

Source:Official Docs OpenRouter

AI Performance Evaluation

Arena Overall Score

1482

±6

As of 2026-04-23

Overall Rank

No.8

14,620 Votes

Arena by Ability

Hard Prompts

1495±7No.14

Expert Knowledge

1469±18No.39

Instruction Following

1455±9No.21

Conversation Memory

1491±13No.10

Creative

1467±13No.8

Coding

1513±10No.20

Math

1457±18No.25

Arena by Occupation

Creative Writing

1458±11No.12

Social Sciences

1487±12No.13

Media

1457±12No.9

Business

1472±12No.13

Healthcare

1517±19No.4

Legal

1502±18No.6

Software

1511±8No.13

Mathematics

1459±22No.30

Source:Arena Intelligence

Overall

AA Intelligence Index

49%↑11%

LiveBench

69%↑9%

Reasoning & Math

GPQA Diamond

91%↑10%

HLE

32%↑15%

LB Reasoning

75%↑16%

LB Math

87%↑14%

LB Data

63%↑13%

Coding

AA Coding Index

41%↑6%

LB Coding

66%↓8%

LB Agentic

43%↑0%

TAU2

93%↑20%

TerminalBench

38%↑7%

SciCode

46%↑5%

Language & Instructions

IFBench

81%↑24%

AA-LCR

58%↓4%

LB Language

78%↑6%

LB IF

63%↑17%

Output Speed

Standard Mode

113tok/s↑31

First Output 0.42s

Reasoning Mode

110tok/s↑22

First Output 27.83s

Source:Artificial Analysis LiveBench

Grok