AI Model Comparison

NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and audio inputs and produces text output, enabling agents to perceive and reason across modalities in a single inference loop. Built on a hybrid MoE Transformer-Mamba architecture with Conv3D video layers and Efficient Video Sampling (EVS), it delivers approximately 2× higher throughput and 2.5× lower compute for video reasoning versus separate vision + speech pipelines. It supports up to 300K context length and a 16,384 reasoning budget, with extended thinking enabled via reasoning.

Author

NVIDIA

Release Date

2026-04-28

Knowledge Cutoff

Unknown

License

Proprietary

I/O Format

Context Length

256K / 66K

API I/O (1M)

—

How to Use

—

Output Speed

—

Arena Overall

—

Intelligence Index

—

Coding Index

—

Math Index

—

LiveBench

—

ForecastBench

—

GPQA Diamond

—

HLE

—

MMLU-Pro

—

AIME 2025

—

MATH-500

—

LB Reasoning

—

LB Math

—

LB Data Analysis

—

LiveCodeBench

—

LB Coding

—

LB Agentic

—

TAU2

—

TerminalBench

—

SciCode

—

IFBench

—

AA-LCR

—

Hallucination (HHEM)

—

Factual Consistency (HHEM)

—

LB Language

—

LB Instruction Following

—

Calculate Cost View Model Details

1 / 3

Swipe to compare