NVIDIA: Nemotron 3 Super

Text input Text output
Author's Description

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it delivers over 50% higher token generation compared to leading open models. The model features a 1M token context window for long-term agent coherence, cross-document reasoning, and multi-step task planning. Latent MoE enables calling 4 experts for the inference cost of only one, improving intelligence and generalization. Multi-environment RL training across 10+ environments delivers leading accuracy on benchmarks including AIME 2025, TerminalBench, and SWE-Bench Verified. Fully open with weights, datasets, and recipes under the NVIDIA Open License, Nemotron 3 Super allows easy customization and secure deployment anywhere — from workstation to cloud.

Key Specifications
Cost
$$$$
Context
8K
Parameters
120B
Released
Mar 11, 2026
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Temperature Frequency Penalty Response Format Include Reasoning Reasoning Max Tokens Top P Presence Penalty
Features

This model supports the following features:

Response Format Reasoning
Performance Summary

NVIDIA Nemotron 3 Super demonstrates competitive response times, ranking in the 53rd percentile for speed across various benchmarks. Its pricing is moderate, placing it in the 37th percentile. Notably, the model exhibits exceptional reliability with a 98% success rate, indicating minimal technical failures and consistent evaluable responses. In terms of performance across categories, Nemotron 3 Super shows a significant strength in Reasoning, achieving 94.0% accuracy (80th percentile), and strong ethical understanding with 99.0% accuracy (53rd percentile). Its Mathematics performance is also solid at 83.0% accuracy (40th percentile). However, the model struggles with Instruction Following (33.0% accuracy, 28th percentile) and Email Classification (89.0% accuracy, 11th percentile), suggesting areas for improvement in precise directive execution and nuanced categorization. Hallucination rates are moderate at 86.0% accuracy (31st percentile), indicating some room for improvement in acknowledging uncertainty. Its Coding and General Knowledge capabilities are average, at 80.0% and 91.5% accuracy respectively, both falling below the 40th percentile.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.3
Completion $0.9

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Nebius
Nebius | nvidia/nemotron-3-super-120b-a12b-20230311 8K $0.3 / 1M tokens $0.9 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by nvidia