NVIDIA: Nemotron Nano 9B V2

Text input Text output Free Option
Author's Description

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so.

Key Specifications
Cost
$
Context
128K
Parameters
9B
Released
Sep 05, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Reasoning Structured Outputs Response Format Tool Choice Tools Include Reasoning
Features

This model supports the following features:

Response Format Tools Reasoning Structured Outputs
Performance Summary

NVIDIA: Nemotron Nano 9B V2 demonstrates exceptional speed, consistently ranking among the fastest models. It also offers competitive pricing, typically providing cost-effective solutions. The model exhibits outstanding reliability with a 99% success rate, indicating minimal technical failures. In terms of benchmark performance, Nemotron Nano 9B V2 shows strong capabilities in several areas. It achieves high accuracy in Hallucinations (98.0%), General Knowledge (98.6%), and particularly excels in Ethics with a perfect 100% accuracy, making it the most accurate among models of comparable speed. Its Reasoning capabilities are also a significant strength, scoring 89.8% accuracy and ranking in the top 3 for cost efficiency in this category. However, the model exhibits notable weaknesses in Instruction Following and Mathematics, where it scored 0.0% accuracy in both benchmarks. Coding performance is moderate at 83.0% accuracy, while Email Classification is 94.1%, placing it in the lower percentile for this task. The model's design, which involves generating a reasoning trace before a final response, is a unique feature that can be controlled via system prompts.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.04
Completion $0.16

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Nvidia
Nvidia | nvidia/nemotron-nano-9b-v2 128K $0.04 / 1M tokens $0.16 / 1M tokens
Nvidia
Nvidia | nvidia/nemotron-nano-9b-v2 128K $0.04 / 1M tokens $0.16 / 1M tokens
DeepInfra
DeepInfra | nvidia/nemotron-nano-9b-v2 131K $0.04 / 1M tokens $0.16 / 1M tokens
Together
Together | nvidia/nemotron-nano-9b-v2 131K $0.06 / 1M tokens $0.25 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by nvidia