1 / 3
Swipe to compare

DeepSeek V4 Flash is the compact, low-latency variant of the V4 series, released April 24, 2026, with 284B total parameters (13B active) — built for cost-efficient inference without sacrificing long-context reasoning. It shares the same Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) architecture as V4 Pro, supporting a 1M-token context window with both Thinking and Non-Thinking modes. Despite its smaller footprint, the V4 Flash base model outperforms the much larger V3.2 base across most benchmarks, particularly in long-context tasks. At $0.14 per million input tokens and $0.28 per million output tokens, it ranks among the cheapest frontier-class models available, making it ideal for high-throughput agentic and document-processing workloads.

Author
DeepSeekDeepSeek
Release Date
2026-04-24
Knowledge Cutoff
License
Proprietary
I/O Format
Context Length
1.0M / 384K
API I/O (1M)
$0.14 / $0.28
How to Use
Output Speed
33 tok/s
Arena Overall
1439
Intelligence Index
46.5
Coding Index
38.7
Math Index
LiveBench
ForecastBench
GPQA Diamond
89.4%
HLE
32.1%
MMLU-Pro
AIME 2025
MATH-500
LB Reasoning
LB Math
LB Data Analysis
LiveCodeBench
LB Coding
LB Agentic
TAU2
95.0%
TerminalBench
35.6%
SciCode
44.9%
IFBench
79.2%
AA-LCR
0.6
Hallucination (HHEM)
Factual Consistency (HHEM)
LB Language
LB Instruction Following