NVIDIA: Nemotron 3 Ultra

Text input Text output
Author's Description

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it...

Key Specifications
Cost
$$$$
Context
262K
Parameters
550B
Released
Jun 03, 2026
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Frequency Penalty Presence Penalty Logit Bias Include Reasoning Tool Choice Stop Tools Temperature Structured Outputs Seed Min P Top P Max Tokens Response Format Reasoning
Features

This model supports the following features:

Response Format Tools Structured Outputs Reasoning
Performance Summary

NVIDIA Nemotron 3 Ultra demonstrates moderate speed performance, ranking in the 31st percentile across benchmarks, and offers competitive pricing, placing it in the 50th percentile. The model excels in specific areas, achieving perfect accuracy in both Instruction Following and Email Classification. For these tasks, it stands out as the most accurate model at its price point and among models of comparable speed. However, its performance varies significantly across other categories. It shows a reasonable ability to acknowledge uncertainty, with 95.2% accuracy in the Hallucinations benchmark. Conversely, the model exhibits notable weaknesses in complex reasoning tasks, scoring only 39.3% in Reasoning, and particularly struggles with specialized knowledge and problem-solving. Its accuracy in Coding (22.9%), General Knowledge (28.6%), Ethics (7.7%), and Mathematics (15.0%) is considerably low, placing it in the lower percentiles for these categories. The model's architecture, a hybrid Transformer-Mamba mixture-of-experts with 55B active parameters, suggests a design optimized for certain types of tasks, while indicating areas for further development in others.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.5
Completion $2.5
Input Cache Read $0.15

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
DeepInfra
DeepInfra | nvidia/nemotron-3-ultra-550b-a55b-20260604 262K $0.5 / 1M tokens $2.5 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by nvidia