AI Model Comparison

Our Story

Grok 4.20 (Reasoning) is the reasoning-enabled configuration of xAI's Grok 4.20, utilizing extended internal thinking to work through problems before presenting answers. Combined with the model's native multi-agent architecture and cross-agent verification, it delivers the highest accuracy in the Grok lineup on tasks requiring deep logic, mathematical reasoning, and complex multi-step problem solving. It supports the same 2M-token context window, strict prompt adherence, and the industry's lowest hallucination rate among its class.

Author

Grok

Release Date

2026-03-31

Knowledge Cutoff

Unknown

License

Proprietary

I/O Format

Context Length

2M / 2M

API I/O (1M)

$2 / $6

How to Use

Grok SuperGrok Heavy or above / API Access

Output Speed

113 tok/s

Arena Overall

1482

Intelligence Index

49.3

Coding Index

40.5

Math Index

—

LiveBench

69.0

ForecastBench

—

GPQA Diamond

91.1%

HLE

32.2%

MMLU-Pro

—

AIME 2025

—

MATH-500

—

LB Reasoning

75.3

LB Math

87.1

LB Data Analysis

62.9

LiveCodeBench

—

LB Coding

66.1

LB Agentic

43.3

TAU2

93.0%

TerminalBench

37.9%

SciCode

45.6%

IFBench

81.2%

AA-LCR

0.6

Hallucination (HHEM)

—

Factual Consistency (HHEM)

—

LB Language

77.7

LB Instruction Following

63.4

Calculate Cost View Model Details

1 / 3

Swipe to compare