Author's Description
NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging [Llama 3.1 70B](/models/meta-llama/llama-3.1-70b-instruct) architecture and Reinforcement Learning from Human Feedback (RLHF), it excels...
Key Specifications
Supported Parameters
This model supports the following parameters:
Features
This model supports the following features:
Performance Summary
NVIDIA's Llama 3.1 Nemotron 70B Instruct demonstrates competitive response times, performing among the faster models with a 49th percentile speed ranking. It also offers cost-effective solutions, ranking in the 73rd percentile for price. The model exhibits strong performance in specific areas. It shows excellent accuracy in Email Classification (99.0%, 91st percentile), indicating a robust understanding of context and purpose for categorization tasks. Its ability to acknowledge uncertainty is also a notable strength, achieving 97.6% accuracy in Hallucinations (Baseline) tests, suggesting a low propensity for generating fabricated information. However, the model presents significant weaknesses in complex reasoning and knowledge-intensive domains. Its performance in Mathematics (17.0% accuracy, 11th percentile), Reasoning (36.0% accuracy, 19th percentile), and Coding (2.0% accuracy, 8th percentile) is considerably low, suggesting limitations in handling intricate problem-solving, logical deduction, and programming-specific queries. General Knowledge (93.8% accuracy, 33rd percentile) and Ethics (89.0% accuracy, 20th percentile) also fall below average compared to other models. Instruction Following shows moderate performance at 44.4% accuracy (35th percentile). Overall, the model is well-suited for classification and tasks requiring precise, non-hallucinatory responses, but less so for complex analytical or knowledge-heavy applications.
Model Pricing
Current Pricing
| Feature | Price (per 1M tokens) |
|---|---|
| Prompt | $1.2 |
| Completion | $1.2 |
Price History
Available Endpoints
| Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
|---|---|---|---|---|
|
Lambda
|
Lambda | nvidia/llama-3.1-nemotron-70b-instruct | 131K | $1.2 / 1M tokens | $1.2 / 1M tokens |
|
DeepInfra
|
DeepInfra | nvidia/llama-3.1-nemotron-70b-instruct | 131K | $1.2 / 1M tokens | $1.2 / 1M tokens |
|
Together
|
Together | nvidia/llama-3.1-nemotron-70b-instruct | 32K | $1.2 / 1M tokens | $1.2 / 1M tokens |
|
Infermatic
|
Infermatic | nvidia/llama-3.1-nemotron-70b-instruct | 32K | $1.2 / 1M tokens | $1.2 / 1M tokens |
Benchmark Results
| Benchmark | Category | Reasoning | Strategy | Free | Executions | Accuracy | Cost | Duration |
|---|
Other Models by nvidia
|
|
Released | Params | Context |
|
Speed | Ability | Cost |
|---|---|---|---|---|---|---|---|
| NVIDIA: Nemotron 3 Super | Mar 11, 2026 | 120B | 262K |
Text input
Text output
|
★★★ | ★★★ | $$$$ |
| NVIDIA: Nemotron 3 Nano 30B A3B | Dec 14, 2025 | 30B | 262K |
Text input
Text output
|
★★★ | ★★★★★ | $$$ |
| NVIDIA: Nemotron Nano 12B 2 VL | Oct 28, 2025 | 12B | 131K |
Text input
Image input
Video input
Text output
|
★ | ★★ | $$$$ |
| NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 | Oct 10, 2025 | 49B | 131K |
Text input
Text output
|
★★ | ★★★★ | $$$$ |
| NVIDIA: Nemotron Nano 9B V2 | Sep 05, 2025 | 9B | 128K |
Text input
Text output
|
★ | ★★ | $ |
| NVIDIA: Llama 3.3 Nemotron Super 49B v1 Unavailable | Apr 08, 2025 | 49B | 131K |
Text input
Text output
|
★★★ | ★★ | $$ |
| NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 | Apr 08, 2025 | 253B | 131K |
Text input
Text output
|
★ | ★★ | $$$$$ |