Nemotron 3 Super is NVIDIA's open hybrid Mamba-Transformer MoE model with 120 billion total parameters, activating just 12 billion for maximum compute efficiency. Its hybrid architecture integrates Mamba layers for sequence efficiency with Transformer layers for precision reasoning, delivering over 5× throughput compared to its predecessor. With a native 1M-token context window and NVFP4 precision optimized for Blackwell GPUs, it scores 85.6% on PinchBench — the best among open models — making it well-suited for complex multi-agent applications, software development, and agentic reasoning.
Nemotron 3 Super is NVIDIA's open hybrid Mamba-Transformer MoE model with 120 billion total parameters, activating just 12 billion for maximum compute efficiency. Its hybrid architecture integrates Mamba layers for sequence efficiency with Transformer layers for precision reasoning, delivering over 5× throughput compared to its predecessor. With a native 1M-token context window and NVFP4 precision optimized for Blackwell GPUs, it scores 85.6% on PinchBench — the best among open models — making it well-suited for complex multi-agent applications, software development, and agentic reasoning.