OpenAI

GPT-5

Name: OpenAI GPT-5
Author: OpenAI

Compare

Model ID:gpt-5-2025-08-07

2025-08-07

Compare

GPT-5 is OpenAI's unified frontier model released in mid-2025, bringing together advanced reasoning, coding, and multimodal capabilities into a single system. It introduced test-time compute scaling with configurable thinking depth, significantly reducing hallucinations and sycophancy compared to previous models. GPT-5 excels at complex multi-step tasks requiring step-by-step reasoning, instruction following, and accuracy in high-stakes scenarios, with notable improvements in coding, writing, and factual reliability.

API|VisionReasoningFile|Proprietary Model

Knowledge Cutoff

2024-09-30

The date this AI finished learning. It may not know about things that happened after this date.

Input → Output Format

The types of content this AI can receive, and what it can produce in return.

Context Memory

400KIN128KOUT

The maximum amount of text the AI can read and process in a single request. A larger number means it can handle longer documents or conversations.

Cost/1M Words

$1.25IN$10OUT

The cost of using this AI directly in your own application. Shown in USD per 1 million units of text (tokens).

Calculate Cost

Source:Official Docs OpenRouter

AI Performance Evaluation

Arena Overall Score

1434

±5

As of 2026-04-23

Overall Rank

No.54

31,986 Votes

Arena by Ability

Hard Prompts

1446±6No.65

Expert Knowledge

1459±16No.47

Instruction Following

1409±7No.72

Conversation Memory

1420±9No.73

Creative

1375±10No.96

Coding

1466±8No.67

Math

1434±14No.46

Arena by Occupation

Creative Writing

1397±8No.76

Social Sciences

1443±9No.66

Media

1397±8No.60

Business

1414±9No.76

Healthcare

1456±15No.56

Legal

1455±14No.45

Software

1452±7No.78

Mathematics

1441±14No.49

Source:Arena Intelligence

Overall

AA Intelligence Index

22%↓17%

LiveBench

71%↑11%

ForecastBench

61%↑2%

Reasoning & Math

AA Math Index

48%↓25%

GPQA Diamond

69%↓12%

HLE

5.8%↓11%

MMLU-Pro

82%↑0%

AIME 2025

48%↓25%

LB Reasoning

82%↑22%

LB Math

86%↑13%

LB Data

57%↑7%

Coding

AA Coding Index

21%↓13%

LiveCodeBench

54%↓11%

LB Coding

72%↓2%

LB Agentic

52%↑8%

TAU2

0.0%↓73%

TerminalBench

13%↓18%

SciCode

38%↓3%

Language & Instructions

IFBench

45%↓12%

AA-LCR

64%↑2%

Hallucination (HHEM)

15%↑5%

Factual (HHEM)

85%↓5%

LB Language

81%↑9%

LB IF

64%↑18%

Output Speed

Standard Mode

77tok/s↓5

First Output 1.03s

Reasoning Mode

85tok/s↓3

First Output 41.72s

Source:Artificial Analysis LiveBench ForecastBench Vectara HHEM

OpenAI