NVIDIA: Nemotron Nano 12B 2 VL

Name: NVIDIA: Nemotron Nano 12B 2 VL
Brand: nvidia
Price: 2e-7 USD
Availability: InStock
Rating: 2.4 (8 reviews)

Back

Image input Text input Video input Text output Free Option

Author's Description

NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s memory-efficient sequence modeling for significantly higher throughput and lower latency. The model supports inputs of text and multi-image documents, producing natural-language outputs. It is trained on high-quality NVIDIA-curated synthetic datasets optimized for optical-character recognition, chart reasoning, and multimodal comprehension. Nemotron Nano 2 VL achieves leading results on OCRBench v2 and scores ≈ 74 average across MMMU, MathVista, AI2D, OCRBench, OCR-Reasoning, ChartQA, DocVQA, and Video-MME—surpassing prior open VL baselines. With Efficient Video Sampling (EVS), it handles long-form videos while reducing inference cost. Open-weights, training data, and fine-tuning recipes are released under a permissive NVIDIA open license, with deployment supported across NeMo, NIM, and major inference runtimes.

Key Specifications

Cost

$$$$

Context

131K

Parameters

12B

Released

Oct 28, 2025

Speed

★

Ability

★★

Reliability

★★

Hugging Face

Supported Parameters

This model supports the following parameters:

Presence Penalty Response Format Min P Temperature Top P Frequency Penalty Max Tokens Seed Stop Reasoning Include Reasoning

Features

This model supports the following features:

Reasoning Response Format

Performance Summary

NVIDIA's Nemotron Nano 12B 2 VL, a 12-billion-parameter multimodal model, demonstrates exceptional speed, consistently ranking among the fastest models, and offers highly competitive pricing. Its reliability is strong, with a 91% success rate across benchmarks. Designed for video understanding and document intelligence, it leverages a hybrid Transformer-Mamba architecture for high throughput and low latency. The model exhibits significant strengths in knowledge-based tasks, achieving 99.5% accuracy in General Knowledge and a perfect 100.0% in Ethics, where it is noted as the most accurate model at its price point and among models of similar speed. It also performs well in Reasoning (94.0%) and Email Classification (98.0%). However, Nemotron Nano 2 VL shows notable weaknesses in Instruction Following, scoring 0.0% accuracy, and struggles with Coding (65.6%) and Mathematics (52.6%). Its performance on Hallucinations (80.0%) suggests room for improvement in acknowledging uncertainty. Despite these areas for development, its leading results on OCRBench v2 and strong average across various multimodal benchmarks highlight its specialized capabilities.

Model Pricing

Current Pricing

Feature	Price (per 1M tokens)
Prompt	$0.2
Completion	$0.6

Price History

Available Endpoints

Provider	Endpoint Name	Context Length	Pricing (Input)	Pricing (Output)
DeepInfra	DeepInfra \| nvidia/nemotron-nano-12b-v2-vl	131K	$0.2 / 1M tokens	$0.6 / 1M tokens
Nebius	Nebius \| nvidia/nemotron-nano-12b-v2-vl	131K	$0.07 / 1M tokens	$0.2 / 1M tokens

Benchmark Results

Benchmark	Category	Reasoning	Strategy	Free	Executions	Accuracy	Cost	Duration

Other Models by nvidia

	Released	Params	Context	Filter by Modalities All Modalities	Speed	Ability	Cost
NVIDIA: Nemotron 3 Nano 30B A3B	Dec 14, 2025	30B	262K	Text input Text output	★★★	★★★★★	$$$
NVIDIA: Llama 3.3 Nemotron Super 49B V1.5	Oct 10, 2025	49B	131K	Text input Text output	★	★★★★	$$$$
NVIDIA: Nemotron Nano 9B V2	Sep 05, 2025	9B	128K	Text input Text output	★	★★	$
NVIDIA: Llama 3.3 Nemotron Super 49B v1 Unavailable	Apr 08, 2025	49B	131K	Text input Text output	★★★	★★	$$
NVIDIA: Llama 3.1 Nemotron Ultra 253B v1	Apr 08, 2025	253B	131K	Text input Text output	★	★★	$$$$$
NVIDIA: Llama 3.1 Nemotron 70B Instruct	Oct 14, 2024	70B	131K	Text input Text output	★★★	★★	$$$