AI Model Comparison

Our Story

Voxtral Mini TTS is Mistral's first text-to-speech model, released March 2026 as the generative counterpart to the Voxtral speech-recognition family. It is a ~4B-parameter model designed for low-latency voice agents and streaming applications, with a 4,096-token context window and raw-audio output. Voxtral supports nine languages — English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic — and performs zero-shot voice cloning from as little as three seconds of reference audio, preserving intonation, rhythm, and emotional delivery without explicit prosody tags. In head-to-head testing it wins 68.4% of preference votes against ElevenLabs Flash v2.5.

Author

Mistral AI

Release Date

—

Knowledge Cutoff

—

License

Proprietary

I/O Format

Context Length

—

API I/O (1M)

—

How to Use

—

Output Speed

—

Arena Overall

—

Intelligence Index

—

Coding Index

—

Math Index

—

LiveBench

—

ForecastBench

—

GPQA Diamond

—

HLE

—

MMLU-Pro

—

AIME 2025

—

MATH-500

—

LB Reasoning

—

LB Math

—

LB Data Analysis

—

LiveCodeBench

—

LB Coding

—

LB Agentic

—

TAU2

—

TerminalBench

—

SciCode

—

IFBench

—

AA-LCR

—

Hallucination (HHEM)

—

Factual Consistency (HHEM)

—

LB Language

—

LB Instruction Following

—

Calculate Cost View Model Details

1 / 3

Swipe to compare