AI Model Comparison

Our Story

DeepSeek V4 Flash is the compact, low-latency variant of the V4 series, released April 24, 2026, with 284B total parameters (13B active) — built for cost-efficient inference without sacrificing long-context reasoning. It shares the same Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) architecture as V4 Pro, supporting a 1M-token context window with both Thinking and Non-Thinking modes. Despite its smaller footprint, the V4 Flash base model outperforms the much larger V3.2 base across most benchmarks, particularly in long-context tasks. At $0.14 per million input tokens and $0.28 per million output tokens, it ranks among the cheapest frontier-class models available, making it ideal for high-throughput agentic and document-processing workloads.

Author

DeepSeek

Release Date

2026-04-24

Knowledge Cutoff

—

License

Proprietary

I/O Format

Context Length

1.0M / 384K

API I/O (1M)

$0.14 / $0.28

How to Use

—

Output Speed

33 tok/s

Arena Overall

1439

Intelligence Index

46.5

Coding Index

38.7

Math Index

—

LiveBench

—

ForecastBench

—

GPQA Diamond

89.4%

HLE

32.1%

MMLU-Pro

—

AIME 2025

—

MATH-500

—

LB Reasoning

—

LB Math

—

LB Data Analysis

—

LiveCodeBench

—

LB Coding

—

LB Agentic

—

TAU2

95.0%

TerminalBench

35.6%

SciCode

44.9%

IFBench

79.2%

AA-LCR

0.6

Hallucination (HHEM)

—

Factual Consistency (HHEM)

—

LB Language

—

LB Instruction Following

—

Calculate Cost View Model Details

1 / 3

Swipe to compare