NVIDIA: Llama 3.3 Nemotron Super 49B v1

Text input Text output
Author's Description

Llama-3.3-Nemotron-Super-49B-v1 is a large language model (LLM) optimized for advanced reasoning, conversational interactions, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta's Llama-3.3-70B-Instruct, it employs a Neural Architecture Search (NAS) approach, significantly enhancing efficiency and reducing memory requirements. This allows the model to support a context length of up to 128K tokens and fit efficiently on single high-performance GPUs, such as NVIDIA H200. Note: you must include `detailed thinking on` in the system prompt to enable reasoning. Please see [Usage Recommendations](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1#quick-start-and-usage-recommendations) for more.

Key Specifications
Cost
$$
Context
131K
Parameters
49B
Released
Apr 08, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Stop Presence Penalty Logit Bias Top P Temperature Seed Frequency Penalty Logprobs Max Tokens Top Logprobs
Performance Summary

NVIDIA's Llama 3.3 Nemotron Super 49B v1 demonstrates competitive overall performance, particularly in its cost-effectiveness and reliability. The model exhibits competitive response times, ranking in the 54th percentile for speed across various benchmarks. It offers a cost-effective solution, placing in the 72nd percentile for price, indicating favorable operational expenses. Notably, its reliability is strong, achieving the 83rd percentile, signifying consistent and dependable output with minimal technical failures. Analyzing specific benchmark categories, the model excels in General Knowledge and Email Classification, achieving 98.0% and 97.0% accuracy respectively, showcasing its proficiency in factual recall and precise categorization. Its performance in Instruction Following is also solid at 56.6% accuracy. However, the model exhibits significant weaknesses in Coding, where it scored a very low 6.0% accuracy, coupled with high cost and long duration, suggesting it is not optimized for complex programming tasks. Performance in Reasoning (50.0% accuracy) and Ethics (94.0% accuracy, but low 29th percentile) indicates room for improvement in complex logical deduction and nuanced ethical judgment compared to peers. Key strengths include its high reliability, cost-efficiency, and strong capabilities in general knowledge and classification tasks. Its ability to support a 128K context length and fit on single high-performance GPUs is a significant advantage for deployment. The primary weakness lies in its limited proficiency in coding and potentially complex reasoning, which may require further optimization for such applications.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.13
Completion $0.4

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Nebius
Nebius | nvidia/llama-3.3-nemotron-super-49b-v1 131K $0.13 / 1M tokens $0.4 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by nvidia