DeepSeek V4 Flash is the compact, low-latency variant of the V4 series, released April 24, 2026, with 284B total parameters (13B active) — built for cost-efficient inference without sacrificing long-context reasoning. It shares the same Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) architecture as V4 Pro, supporting a 1M-token context window with both Thinking and Non-Thinking modes. Despite its smaller footprint, the V4 Flash base model outperforms the much larger V3.2 base across most benchmarks, particularly in long-context tasks. At $0.14 per million input tokens and $0.28 per million output tokens, it ranks among the cheapest frontier-class models available, making it ideal for high-throughput agentic and document-processing workloads.