AI Model Comparison

Our Story

GPT Audio is OpenAI's multimodal audio model designed for native speech-to-speech interaction via the Chat Completions API. Unlike traditional voice pipelines that chain separate speech-to-text and text-to-speech models, GPT Audio processes and generates audio directly through a single model, resulting in lower latency, more natural-sounding voices, and better preservation of speech nuances such as tone and emotion.

Author

OpenAI

Release Date

2025-08-28

Knowledge Cutoff

2023-10-01

License

Proprietary

I/O Format

Context Length

128K / 16K

API I/O (1M)

$2.5 / $10

How to Use

—

Output Speed

—

Arena Overall

—

Intelligence Index

—

Coding Index

—

Math Index

—

LiveBench

—

ForecastBench

—

GPQA Diamond

—

HLE

—

MMLU-Pro

—

AIME 2025

—

MATH-500

—

LB Reasoning

—

LB Math

—

LB Data Analysis

—

LiveCodeBench

—

LB Coding

—

LB Agentic

—

TAU2

—

TerminalBench

—

SciCode

—

IFBench

—

AA-LCR

—

Hallucination (HHEM)

—

Factual Consistency (HHEM)

—

LB Language

—

LB Instruction Following

—

Calculate Cost View Model Details

1 / 3

Swipe to compare