Author's Description
NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it delivers over 50% higher token generation compared to leading open models. The model features a 1M token context window for long-term agent coherence, cross-document reasoning, and multi-step task planning. Latent MoE enables calling 4 experts for the inference cost of only one, improving intelligence and generalization. Multi-environment RL training across 10+ environments delivers leading accuracy on benchmarks including AIME 2025, TerminalBench, and SWE-Bench Verified. Fully open with weights, datasets, and recipes under the NVIDIA Open License, Nemotron 3 Super allows easy customization and secure deployment anywhere — from workstation to cloud.
Key Specifications
Supported Parameters
This model supports the following parameters:
Features
This model supports the following features:
Performance Summary
NVIDIA Nemotron 3 Super demonstrates competitive response times, ranking in the 53rd percentile for speed across various benchmarks. Its pricing is moderate, placing it in the 37th percentile. Notably, the model exhibits exceptional reliability with a 98% success rate, indicating minimal technical failures and consistent evaluable responses. In terms of performance across categories, Nemotron 3 Super shows a significant strength in Reasoning, achieving 94.0% accuracy (80th percentile), and strong ethical understanding with 99.0% accuracy (53rd percentile). Its Mathematics performance is also solid at 83.0% accuracy (40th percentile). However, the model struggles with Instruction Following (33.0% accuracy, 28th percentile) and Email Classification (89.0% accuracy, 11th percentile), suggesting areas for improvement in precise directive execution and nuanced categorization. Hallucination rates are moderate at 86.0% accuracy (31st percentile), indicating some room for improvement in acknowledging uncertainty. Its Coding and General Knowledge capabilities are average, at 80.0% and 91.5% accuracy respectively, both falling below the 40th percentile.
Model Pricing
Current Pricing
| Feature | Price (per 1M tokens) |
|---|---|
| Prompt | $0.3 |
| Completion | $0.9 |
Price History
Available Endpoints
| Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
|---|---|---|---|---|
|
Nebius
|
Nebius | nvidia/nemotron-3-super-120b-a12b-20230311 | 8K | $0.3 / 1M tokens | $0.9 / 1M tokens |
Benchmark Results
| Benchmark | Category | Reasoning | Strategy | Free | Executions | Accuracy | Cost | Duration |
|---|
Other Models by nvidia
|
|
Released | Params | Context |
|
Speed | Ability | Cost |
|---|---|---|---|---|---|---|---|
| NVIDIA: Nemotron 3 Nano 30B A3B | Dec 14, 2025 | 30B | 262K |
Text input
Text output
|
★★★ | ★★★★★ | $$$ |
| NVIDIA: Nemotron Nano 12B 2 VL | Oct 28, 2025 | 12B | 131K |
Text input
Video input
Image input
Text output
|
★ | ★★ | $$$$ |
| NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 | Oct 10, 2025 | 49B | 131K |
Text input
Text output
|
★★ | ★★★★ | $$$$ |
| NVIDIA: Nemotron Nano 9B V2 | Sep 05, 2025 | 9B | 128K |
Text input
Text output
|
★ | ★★ | $ |
| NVIDIA: Llama 3.3 Nemotron Super 49B v1 Unavailable | Apr 08, 2025 | 49B | 131K |
Text input
Text output
|
★★★ | ★★ | $$ |
| NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 Unavailable | Apr 08, 2025 | 253B | 131K |
Text input
Text output
|
★ | ★★ | $$$$$ |
| NVIDIA: Llama 3.1 Nemotron 70B Instruct | Oct 14, 2024 | 70B | 131K |
Text input
Text output
|
★★★ | ★★ | $$ |