1 / 3
Swipe to compare

NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and audio inputs and produces text output, enabling agents to perceive and reason across modalities in a single inference loop. Built on a hybrid MoE Transformer-Mamba architecture with Conv3D video layers and Efficient Video Sampling (EVS), it delivers approximately 2× higher throughput and 2.5× lower compute for video reasoning versus separate vision + speech pipelines. It supports up to 300K context length and a 16,384 reasoning budget, with extended thinking enabled via reasoning.

Author
NVIDIANVIDIA
Release Date
2026-04-28
Knowledge Cutoff
Unknown
License
Proprietary
I/O Format
Context Length
256K / 66K
API I/O (1M)
How to Use
Output Speed
Arena Overall
Intelligence Index
Coding Index
Math Index
LiveBench
ForecastBench
GPQA Diamond
HLE
MMLU-Pro
AIME 2025
MATH-500
LB Reasoning
LB Math
LB Data Analysis
LiveCodeBench
LB Coding
LB Agentic
TAU2
TerminalBench
SciCode
IFBench
AA-LCR
Hallucination (HHEM)
Factual Consistency (HHEM)
LB Language
LB Instruction Following