Author's Description
NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging [Llama 3.1 70B](/models/meta-llama/llama-3.1-70b-instruct) architecture and Reinforcement Learning from Human Feedback (RLHF), it excels in automatic alignment benchmarks. This model is tailored for applications requiring high accuracy in helpfulness and response generation, suitable for diverse user queries across multiple domains. Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
Key Specifications
Supported Parameters
This model supports the following parameters:
Features
This model supports the following features:
Performance Summary
NVIDIA's Llama 3.1 Nemotron 70B Instruct demonstrates competitive response times, performing among the faster models with a 45th percentile speed ranking. It also offers cost-effective solutions, ranking in the 70th percentile for price. The model exhibits strong performance in specific areas, particularly in Email Classification, achieving 99.0% accuracy (89th percentile), indicating excellent contextual understanding for categorization tasks. Its ability to appropriately acknowledge uncertainty is also a notable strength, with 97.6% accuracy in Hallucinations (63rd percentile). Instruction Following shows moderate performance at 44.4% accuracy. However, the model struggles significantly with complex analytical tasks, evidenced by low accuracy in Mathematics (17.0%, 14th percentile), Reasoning (36.0%, 24th percentile), and especially Coding (2.0%, 9th percentile). General Knowledge and Ethics benchmarks also show room for improvement, with 93.8% (39th percentile) and 89.0% (23rd percentile) accuracy respectively. Overall, the model excels in classification and avoiding hallucinations, making it suitable for applications requiring precise, non-hallucinatory responses, but it is less effective for tasks demanding advanced mathematical, logical, or coding proficiency.
Model Pricing
Current Pricing
Feature | Price (per 1M tokens) |
---|---|
Prompt | $0.6 |
Completion | $0.6 |
Price History
Available Endpoints
Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
---|---|---|---|---|
Lambda
|
Lambda | nvidia/llama-3.1-nemotron-70b-instruct | 131K | $0.6 / 1M tokens | $0.6 / 1M tokens |
DeepInfra
|
DeepInfra | nvidia/llama-3.1-nemotron-70b-instruct | 131K | $0.6 / 1M tokens | $0.6 / 1M tokens |
Together
|
Together | nvidia/llama-3.1-nemotron-70b-instruct | 32K | $0.6 / 1M tokens | $0.6 / 1M tokens |
Infermatic
|
Infermatic | nvidia/llama-3.1-nemotron-70b-instruct | 32K | $1 / 1M tokens | $1 / 1M tokens |
Benchmark Results
Benchmark | Category | Reasoning | Strategy | Free | Executions | Accuracy | Cost | Duration |
---|
Other Models by nvidia
|
Released | Params | Context |
|
Speed | Ability | Cost |
---|---|---|---|---|---|---|---|
NVIDIA: Nemotron Nano 9B V2 | Sep 05, 2025 | 9B | 128K |
Text input
Text output
|
★ | ★★ | $ |
NVIDIA: Llama 3.3 Nemotron Super 49B v1 Unavailable | Apr 08, 2025 | 49B | 131K |
Text input
Text output
|
★★★ | ★★ | $$ |
NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 | Apr 08, 2025 | 253B | 131K |
Text input
Text output
|
★ | ★★ | $$$$$ |