NVIDIA: Llama 3.1 Nemotron 70B Instruct

Text input Text output
Author's Description

NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging [Llama 3.1 70B](/models/meta-llama/llama-3.1-70b-instruct) architecture and Reinforcement Learning from Human Feedback (RLHF), it excels in automatic alignment benchmarks. This model is tailored for applications requiring high accuracy in helpfulness and response generation, suitable for diverse user queries across multiple domains. Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

Key Specifications
Cost
$$
Context
131K
Parameters
70B
Released
Oct 14, 2024
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Stop Presence Penalty Logit Bias Temperature Seed Response Format Frequency Penalty Max Tokens Tool Choice Top P Min P Tools Logprobs Top Logprobs
Features

This model supports the following features:

Tools Response Format
Performance Summary

NVIDIA's Llama 3.1 Nemotron 70B demonstrates competitive response times, performing among the faster models with a 52nd percentile ranking across six benchmarks. It also offers cost-effective solutions, ranking in the 73rd percentile for price. The model exhibits strong performance in classification tasks, achieving 99.0% accuracy in Email Classification, placing it in the 89th percentile for that category. It also shows solid capabilities in General Knowledge (93.8% accuracy) and Instruction Following (44.4% accuracy). However, a significant weakness is observed in Coding (Baseline), where it achieved only 2.0% accuracy, ranking in the 10th percentile. Its performance in Ethics (89.0% accuracy) and Reasoning (52.0% accuracy) is moderate, falling around the 25th and 41st percentiles respectively. While generally efficient in terms of cost and speed, the model's utility for coding-specific applications appears limited based on these benchmarks. Its strengths lie in its ability to accurately classify information and handle general knowledge queries, making it suitable for applications requiring high accuracy in helpfulness and response generation in these domains.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.12
Completion $0.3

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Lambda
Lambda | nvidia/llama-3.1-nemotron-70b-instruct 131K $0.12 / 1M tokens $0.3 / 1M tokens
DeepInfra
DeepInfra | nvidia/llama-3.1-nemotron-70b-instruct 131K $0.12 / 1M tokens $0.3 / 1M tokens
Together
Together | nvidia/llama-3.1-nemotron-70b-instruct 32K $0.88 / 1M tokens $0.88 / 1M tokens
Infermatic
Infermatic | nvidia/llama-3.1-nemotron-70b-instruct 32K $1 / 1M tokens $1 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by nvidia