Grok

Grok 4.20

Name: Grok Grok 4.20
Author: Grok

Try It Compare

Model ID:grok-4.20-0309-non-reasoning

2026-03-09

Try It Compare

Grok 4.20 is xAI's newest flagship model released in February 2026, introducing a native 4-agent multi-agent architecture where specialized AI agents collaborate simultaneously on complex queries. It maintains a 2M-token context window — the largest among Western frontier models — and achieves a 65% reduction in hallucination rates through cross-agent verification. The model updates its capabilities weekly based on real-world usage and delivers fast direct answers at 232 tokens per second with 0.54-second time-to-first-token.

Grok SuperGrok HeavyAPI|Proprietary Model

Knowledge Cutoff

Unknown

The date this AI finished learning. It may not know about things that happened after this date.

Input → Output Format

The types of content this AI can receive, and what it can produce in return.

Context Memory

2MIN2MOUT

The maximum amount of text the AI can read and process in a single request. A larger number means it can handle longer documents or conversations.

Cost/1M Words

$2IN$6OUT

The cost of using this AI directly in your own application. Shown in USD per 1 million units of text (tokens).

Calculate Cost

Source:Official Docs OpenRouter

AI Performance Evaluation

Arena Overall Score

1482

±6

As of 2026-04-23

Overall Rank

No.8

14,620 Votes

Arena by Ability

Hard Prompts

1495±7No.14

Expert Knowledge

1469±18No.39

Instruction Following

1455±9No.21

Conversation Memory

1491±13No.10

Creative

1467±13No.8

Coding

1513±10No.20

Math

1457±18No.25

Arena by Occupation

Creative Writing

1458±11No.12

Social Sciences

1487±12No.13

Media

1457±12No.9

Business

1472±12No.13

Healthcare

1517±19No.4

Legal

1502±18No.6

Software

1511±8No.13

Mathematics

1459±22No.30

Source:Arena Intelligence

Overall

AA Intelligence Index

29%↓9%

LiveBench

38%↓22%

ForecastBench

62%↑3%

Reasoning & Math

GPQA Diamond

78%↓3%

HLE

24%↑7%

LB Reasoning

26%↓34%

LB Math

46%↓28%

LB Data

43%↓6%

Coding

AA Coding Index

22%↓12%

LB Coding

59%↓15%

LB Agentic

38%↓5%

TAU2

60%↓13%

TerminalBench

17%↓14%

SciCode

33%↓8%

Language & Instructions

IFBench

49%↓7%

AA-LCR

17%↓44%

LB Language

42%↓30%

LB IF

24%↓22%

Output Speed

Standard Mode

107tok/s↑25

First Output 0.43s

Reasoning Mode

248tok/s↑160

First Output 11.74s

Source:Artificial Analysis LiveBench ForecastBench

Grok