Author's Description
Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and multi-turn chat, followed by multiple RL stages; Reward-aware Preference Optimization (RPO) for alignment, RL with Verifiable Rewards (RLVR) for step-wise reasoning, and iterative DPO to refine tool-use behavior. A distillation-driven Neural Architecture Search (“Puzzle”) replaces some attention blocks and varies FFN widths to shrink memory footprint and improve throughput, enabling single-GPU (H100/H200) deployment while preserving instruction following and CoT quality. In internal evaluations (NeMo-Skills, up to 16 runs, temp = 0.6, top_p = 0.95), the model reports strong reasoning/coding results, e.g., MATH500 pass@1 = 97.4, AIME-2024 = 87.5, AIME-2025 = 82.71, GPQA = 71.97, LiveCodeBench (24.10–25.02) = 73.58, and MMLU-Pro (CoT) = 79.53. The model targets practical inference efficiency (high tokens/s, reduced VRAM) with Transformers/vLLM support and explicit “reasoning on/off” modes (chat-first defaults, greedy recommended when disabled). Suitable for building agents, assistants, and long-context retrieval systems where balanced accuracy-to-cost and reliable tool use matter.
Key Specifications
Supported Parameters
This model supports the following parameters:
Features
This model supports the following features:
Performance Summary
The NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 model, created on October 10, 2025, demonstrates strong performance in several key areas, particularly in its reliability and specialized capabilities. With an exceptional reliability ranking of 97% success rate, it consistently provides usable responses, minimizing technical failures. While its speed ranking places it in the 18th percentile, indicating longer response times, its moderate pricing at the 38th percentile offers a balanced cost-efficiency. The model excels in Ethics (100% accuracy), General Knowledge (99%), Email Classification (99%), and Reasoning (92%), showcasing its robust understanding and processing capabilities. Its internal evaluations also highlight impressive results in MATH500 (97.4% pass@1) and AIME-2024 (87.5%), underscoring its strength in mathematical and scientific reasoning. A notable weakness is its performance in Instruction Following (60.2% accuracy), which falls in the 64th percentile, suggesting room for improvement in complex multi-step directives. Despite a 90% accuracy in Hallucinations, its 37th percentile ranking indicates it's not among the top performers in acknowledging uncertainty. The model's design, incorporating distillation-driven Neural Architecture Search, aims for practical inference efficiency, making it suitable for agentic workflows, assistants, and long-context retrieval systems where accuracy-to-cost balance and reliable tool use are critical.
Model Pricing
Current Pricing
| Feature | Price (per 1M tokens) |
|---|---|
| Prompt | $0.1 |
| Completion | $0.4 |
Price History
Available Endpoints
| Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
|---|---|---|---|---|
|
DeepInfra
|
DeepInfra | nvidia/llama-3.3-nemotron-super-49b-v1.5 | 131K | $0.1 / 1M tokens | $0.4 / 1M tokens |
Benchmark Results
| Benchmark | Category | Reasoning | Strategy | Free | Executions | Accuracy | Cost | Duration |
|---|
Other Models by nvidia
|
|
Released | Params | Context |
|
Speed | Ability | Cost |
|---|---|---|---|---|---|---|---|
| NVIDIA: Nemotron Nano 12B 2 VL | Oct 28, 2025 | 12B | 131K |
Image input
Text input
Text output
|
★ | ★★ | $$$$ |
| NVIDIA: Nemotron Nano 9B V2 | Sep 05, 2025 | 9B | 128K |
Text input
Text output
|
★ | ★★ | $ |
| NVIDIA: Llama 3.3 Nemotron Super 49B v1 Unavailable | Apr 08, 2025 | 49B | 131K |
Text input
Text output
|
★★★ | ★★ | $$ |
| NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 | Apr 08, 2025 | 253B | 131K |
Text input
Text output
|
★ | ★★ | $$$$$ |
| NVIDIA: Llama 3.1 Nemotron 70B Instruct | Oct 14, 2024 | 70B | 131K |
Text input
Text output
|
★★★ | ★★ | $$$ |