AI Model Comparison

Our Story

Grok 4.20 is xAI's newest flagship model released in February 2026, introducing a native 4-agent multi-agent architecture where specialized AI agents collaborate simultaneously on complex queries. It maintains a 2M-token context window — the largest among Western frontier models — and achieves a 65% reduction in hallucination rates through cross-agent verification. The model updates its capabilities weekly based on real-world usage and delivers fast direct answers at 232 tokens per second with 0.54-second time-to-first-token.

Author

Grok

Release Date

2026-03-09

Knowledge Cutoff

Unknown

License

Proprietary

I/O Format

Context Length

2M / 2M

API I/O (1M)

$2 / $6

How to Use

Grok SuperGrok Heavy or above / API Access

Output Speed

107 tok/s

Arena Overall

1482

Intelligence Index

29.0

Coding Index

22.0

Math Index

—

LiveBench

37.9

ForecastBench

61.8

GPQA Diamond

77.6%

HLE

24.2%

MMLU-Pro

—

AIME 2025

—

MATH-500

—

LB Reasoning

25.6

LB Math

45.5

LB Data Analysis

43.5

LiveCodeBench

—

LB Coding

58.5

LB Agentic

38.3

TAU2

59.9%

TerminalBench

16.7%

SciCode

32.8%

IFBench

49.3%

AA-LCR

0.2

Hallucination (HHEM)

—

Factual Consistency (HHEM)

—

LB Language

42.0

LB Instruction Following

24.4

Calculate Cost View Model Details

1 / 3

Swipe to compare