GPT-4.1 is OpenAI's flagship language model optimized for coding, instruction following, and long-context reasoning, released in April 2025. It supports a 1-million-token context window — over 8× the capacity of GPT-4o — and achieves 54.6% on SWE-bench Verified, representing a major improvement in real-world software engineering tasks. The model excels at precise code diffs, agent reliability, and high recall across large document contexts, making it well-suited for IDE tooling, automated coding agents, and enterprise knowledge retrieval.
API|VisionWeb SearchFile|Proprietary Model
Knowledge Cutoff
2024-06-30
Input → Output Format
Context Memory
1.0MIN33KOUT
AI Performance Evaluation
Arena Overall Score
1312
±4As of 2026-04-23
Overall Rank
No.207
100,105 Votes
Arena by Ability
Hard Prompts
1311±6No.213
Expert Knowledge
1286±12No.206
Instruction Following
1294±6No.205
Conversation Memory
1298±8No.206
Creative
1285±8No.194
Coding
1338±7No.214
Math
1303±8No.184
Arena by Occupation
Creative Writing
1306±6No.188
Social Sciences
1321±8No.211
Media
1290±8No.182
Business
1282±9No.226
Healthcare
1305±12No.212
Legal
1317±11No.215
Software
1324±6No.221
Mathematics
1308±8No.186
Source:Arena Intelligence
Overall
AA Intelligence Index
26%↓12%
ForecastBench
59%↑0%
Reasoning & Math
AA Math Index
35%↓39%
GPQA Diamond
67%↓14%
HLE
4.6%↓13%
MMLU-Pro
81%↓1%
AIME 2025
35%↓39%
MATH-500
91%↓2%
Coding
AA Coding Index
22%↓12%
LiveCodeBench
46%↓20%
TAU2
47%↓26%
TerminalBench
14%↓17%
SciCode
38%↓3%
Language & Instructions
IFBench
43%↓14%
AA-LCR
61%↓1%
Hallucination (HHEM)
5.6%↓5%
Factual (HHEM)
94%↑4%
Output Speed
Standard Mode
103tok/s↑21
First Output 0.58s