NVIDIA: Llama 3.3 Nemotron Super 49B v1

Name: NVIDIA: Llama 3.3 Nemotron Super 49B v1
Brand: nvidia
Availability: OutOfStock
Rating: 2.3 (5 reviews)

Back

Text input Text output Unavailable

Author's Description

Llama-3.3-Nemotron-Super-49B-v1 is a large language model (LLM) optimized for advanced reasoning, conversational interactions, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta's Llama-3.3-70B-Instruct, it employs a Neural Architecture Search (NAS) approach, significantly enhancing efficiency and reducing memory requirements. This allows the model to support a context length of up to 128K tokens and fit efficiently on single high-performance GPUs, such as NVIDIA H200. Note: you must include `detailed thinking on` in the system prompt to enable reasoning. Please see [Usage Recommendations](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1#quick-start-and-usage-recommendations) for more.

Key Specifications

Cost

Context

131K

Parameters

49B

Released

Apr 08, 2025

Speed

★★★

Ability

★★

Reliability

★

Hugging Face

Supported Parameters

This model supports the following parameters:

Stop Presence Penalty Logit Bias Max Tokens Frequency Penalty Top P Seed Temperature Logprobs Top Logprobs

Performance Summary

The NVIDIA: Llama 3.3 Nemotron Super 49B v1 model demonstrates a balanced performance profile, particularly excelling in reliability and cost-effectiveness. It consistently provides evaluable responses with an 80% success rate, indicating strong technical stability. The model typically offers cost-effective solutions, ranking in the 73rd percentile for pricing across benchmarks. Its speed performance is competitive, placing it in the 56th percentile for response times. In terms of specific capabilities, the model shows high accuracy in General Knowledge (98.0%) and Email Classification (97.0%), with the latter also benefiting from exceptionally fast processing times. It performs well in Ethics, achieving 94.0% accuracy, though its percentile ranking suggests other models may surpass it in this domain. A notable strength is its efficiency, derived from a Neural Architecture Search approach, allowing it to support a 128K context length and operate on single high-performance GPUs. However, the model exhibits a significant weakness in Coding tasks, achieving only 6.0% accuracy, placing it in the 12th percentile. Its instruction following accuracy is moderate at 56.6%, and it shows slower durations for both instruction following and coding benchmarks. Overall, it is well-suited for advanced reasoning, conversational interactions, RAG, and tool-calling, provided coding capabilities are not a primary requirement.

Model Pricing

Current Pricing

Feature	Price (per 1M tokens)
Prompt	$0.13
Completion	$0.4

Price History

Available Endpoints

Provider	Endpoint Name	Context Length	Pricing (Input)	Pricing (Output)
Nebius	Nebius \| nvidia/llama-3.3-nemotron-super-49b-v1	131K	$0.13 / 1M tokens	$0.4 / 1M tokens

Benchmark Results

Benchmark	Category	Reasoning	Strategy	Free	Executions	Accuracy	Cost	Duration

Other Models by nvidia

	Released	Params	Context	Filter by Modalities All Modalities	Speed	Ability	Cost
NVIDIA: Nemotron Nano 12B 2 VL	Oct 28, 2025	12B	131K	Image input Video input Text input Text output	★	★★	$$$$
NVIDIA: Llama 3.3 Nemotron Super 49B V1.5	Oct 10, 2025	49B	131K	Text input Text output	★	★★★★	$$$$
NVIDIA: Nemotron Nano 9B V2	Sep 05, 2025	9B	128K	Text input Text output	★	★★	$
NVIDIA: Llama 3.1 Nemotron Ultra 253B v1	Apr 08, 2025	253B	131K	Text input Text output	★	★★	$$$$$
NVIDIA: Llama 3.1 Nemotron 70B Instruct	Oct 14, 2024	70B	131K	Text input Text output	★★★	★★	$$$