NVIDIA: Llama 3.1 Nemotron 70B Instruct

Text input Text output
Author's Description

NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging [Llama 3.1 70B](/models/meta-llama/llama-3.1-70b-instruct) architecture and Reinforcement Learning from Human Feedback (RLHF), it excels...

Key Specifications
Cost
$$
Context
131K
Parameters
70B
Released
Oct 14, 2024
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Seed Min P Response Format Temperature Presence Penalty Top Logprobs Tools Frequency Penalty Top P Logprobs Stop Tool Choice Max Tokens Logit Bias
Features

This model supports the following features:

Response Format Tools
Performance Summary

NVIDIA's Llama 3.1 Nemotron 70B Instruct demonstrates competitive response times, performing among the faster models with a 49th percentile speed ranking. It also offers cost-effective solutions, ranking in the 73rd percentile for price. The model exhibits strong performance in specific areas. It shows excellent accuracy in Email Classification (99.0%, 91st percentile), indicating a robust understanding of context and purpose for categorization tasks. Its ability to acknowledge uncertainty is also a notable strength, achieving 97.6% accuracy in Hallucinations (Baseline) tests, suggesting a low propensity for generating fabricated information. However, the model presents significant weaknesses in complex reasoning and knowledge-intensive domains. Its performance in Mathematics (17.0% accuracy, 11th percentile), Reasoning (36.0% accuracy, 19th percentile), and Coding (2.0% accuracy, 8th percentile) is considerably low, suggesting limitations in handling intricate problem-solving, logical deduction, and programming-specific queries. General Knowledge (93.8% accuracy, 33rd percentile) and Ethics (89.0% accuracy, 20th percentile) also fall below average compared to other models. Instruction Following shows moderate performance at 44.4% accuracy (35th percentile). Overall, the model is well-suited for classification and tasks requiring precise, non-hallucinatory responses, but less so for complex analytical or knowledge-heavy applications.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $1.2
Completion $1.2

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Lambda
Lambda | nvidia/llama-3.1-nemotron-70b-instruct 131K $1.2 / 1M tokens $1.2 / 1M tokens
DeepInfra
DeepInfra | nvidia/llama-3.1-nemotron-70b-instruct 131K $1.2 / 1M tokens $1.2 / 1M tokens
Together
Together | nvidia/llama-3.1-nemotron-70b-instruct 32K $1.2 / 1M tokens $1.2 / 1M tokens
Infermatic
Infermatic | nvidia/llama-3.1-nemotron-70b-instruct 32K $1.2 / 1M tokens $1.2 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by nvidia