DeepSeek
DeepSeek

DeepSeek V4 Flash

2026-04-24

DeepSeek V4 Flash is the compact, low-latency variant of the V4 series, released April 24, 2026, with 284B total parameters (13B active) — built for cost-efficient inference without sacrificing long-context reasoning. It shares the same Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) architecture as V4 Pro, supporting a 1M-token context window with both Thinking and Non-Thinking modes. Despite its smaller footprint, the V4 Flash base model outperforms the much larger V3.2 base across most benchmarks, particularly in long-context tasks. At $0.14 per million input tokens and $0.28 per million output tokens, it ranks among the cheapest frontier-class models available, making it ideal for high-throughput agentic and document-processing workloads.

Reasoning|Proprietary Model
Knowledge Cutoff
Unknown
Input → Output Format
Context Memory
1.0MIN384KOUT
Cost/1M Words
$0.14IN$0.28OUT
Calculate Cost

AI Performance Evaluation

Arena Overall Score
1439
±9
As of 2026-04-23
Overall Rank
No.47
3,607 Votes
Arena by Ability
Hard Prompts
1463±12No.44
Expert Knowledge
1456±29No.48
Instruction Following
1428±16No.48
Conversation Memory
1440±23No.52
Creative
1404±23No.54
Coding
1479±19No.52
Math
1437±35No.45
Arena by Occupation
Creative Writing
1421±19No.45
Social Sciences
1460±22No.45
Media
1404±21No.52
Business
1430±21No.57
Healthcare
1468±35No.39
Legal
1464±32No.30
Software
1476±15No.47
Mathematics
1449±40No.39
Overall
AA Intelligence Index
47%↑8%
Reasoning & Math
GPQA Diamond
89%↑8%
HLE
32%↑15%
Coding
AA Coding Index
39%↑5%
TAU2
95%↑22%
TerminalBench
36%↑5%
SciCode
45%↑4%
Language & Instructions
IFBench
79%↑22%
AA-LCR
63%↑1%
Output Speed
Standard Mode
33tok/s↓49
First Output 1.97s