Mistral AI

Voxtral Mini TTS

Name: Mistral AI Voxtral Mini TTS
Author: Mistral AI

Voxtral Mini TTS is Mistral's first text-to-speech model, released March 2026 as the generative counterpart to the Voxtral speech-recognition family. It is a ~4B-parameter model designed for low-latency voice agents and streaming applications, with a 4,096-token context window and raw-audio output. Voxtral supports nine languages — English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic — and performs zero-shot voice cloning from as little as three seconds of reference audio, preserving intonation, rhythm, and emotional delivery without explicit prosody tags. In head-to-head testing it wins 68.4% of preference votes against ElevenLabs Flash v2.5.

Proprietary Model

Knowledge Cutoff

Unknown

The date this AI finished learning. It may not know about things that happened after this date.

Input → Output Format

The types of content this AI can receive, and what it can produce in return.

Context Memory

—

The maximum amount of text the AI can read and process in a single request. A larger number means it can handle longer documents or conversations.

Cost/1M Words

—

The cost of using this AI directly in your own application. Shown in USD per 1 million units of text (tokens).

Calculate Cost

Mistral AI