NVIDIA: Llama 3.1 Nemotron 70B Instruct

Text input Text output
Author's Description

NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging [Llama 3.1 70B](/models/meta-llama/llama-3.1-70b-instruct) architecture and Reinforcement Learning from Human Feedback (RLHF), it excels in automatic alignment benchmarks. This model is tailored for applications requiring high accuracy in helpfulness and response generation, suitable for diverse user queries across multiple domains. Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

Key Specifications
Cost
$$$
Context
131K
Parameters
70B
Released
Oct 14, 2024
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Tool Choice Response Format Seed Top P Temperature Top Logprobs Tools Logit Bias Logprobs Stop Min P Max Tokens Frequency Penalty Presence Penalty
Features

This model supports the following features:

Tools Response Format
Performance Summary

NVIDIA's Llama 3.1 Nemotron 70B Instruct demonstrates competitive response times, performing among the faster models with a 45th percentile speed ranking. It also offers cost-effective solutions, ranking in the 70th percentile for price. The model exhibits strong performance in specific areas, particularly in Email Classification, achieving 99.0% accuracy (89th percentile), indicating excellent contextual understanding for categorization tasks. Its ability to appropriately acknowledge uncertainty is also a notable strength, with 97.6% accuracy in Hallucinations (63rd percentile). Instruction Following shows moderate performance at 44.4% accuracy. However, the model struggles significantly with complex analytical tasks, evidenced by low accuracy in Mathematics (17.0%, 14th percentile), Reasoning (36.0%, 24th percentile), and especially Coding (2.0%, 9th percentile). General Knowledge and Ethics benchmarks also show room for improvement, with 93.8% (39th percentile) and 89.0% (23rd percentile) accuracy respectively. Overall, the model excels in classification and avoiding hallucinations, making it suitable for applications requiring precise, non-hallucinatory responses, but it is less effective for tasks demanding advanced mathematical, logical, or coding proficiency.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.6
Completion $0.6

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Lambda
Lambda | nvidia/llama-3.1-nemotron-70b-instruct 131K $0.6 / 1M tokens $0.6 / 1M tokens
DeepInfra
DeepInfra | nvidia/llama-3.1-nemotron-70b-instruct 131K $0.6 / 1M tokens $0.6 / 1M tokens
Together
Together | nvidia/llama-3.1-nemotron-70b-instruct 32K $0.6 / 1M tokens $0.6 / 1M tokens
Infermatic
Infermatic | nvidia/llama-3.1-nemotron-70b-instruct 32K $1 / 1M tokens $1 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by nvidia