NVIDIA: Llama 3.1 Nemotron 70B Instruct

Name: NVIDIA: Llama 3.1 Nemotron 70B Instruct
Brand: nvidia
Price: 1.2e-6 USD
Availability: InStock
Rating: 2.0 (8 reviews)

Back

Text input Text output

Author's Description

NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging [Llama 3.1 70B](/models/meta-llama/llama-3.1-70b-instruct) architecture and Reinforcement Learning from Human Feedback (RLHF), it excels in automatic alignment benchmarks. This model is tailored for applications requiring high accuracy in helpfulness and response generation, suitable for diverse user queries across multiple domains. Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

Key Specifications

Cost

$$$

Context

131K

Parameters

70B

Released

Oct 14, 2024

Speed

★★★

Ability

★★

Reliability

★

Hugging Face

Supported Parameters

This model supports the following parameters:

Stop Frequency Penalty Top P Seed Tools Temperature Response Format Presence Penalty Logit Bias Max Tokens Tool Choice Logprobs Top Logprobs Min P

Features

This model supports the following features:

Response Format Tools

Performance Summary

NVIDIA's Llama 3.1 Nemotron 70B Instruct demonstrates competitive response times, performing among the faster models with a 45th percentile speed ranking. It also offers cost-effective solutions, ranking in the 70th percentile for price. The model exhibits strong performance in specific areas, particularly in Email Classification, achieving 99.0% accuracy (89th percentile), indicating excellent contextual understanding for categorization tasks. Its ability to appropriately acknowledge uncertainty is also a notable strength, with 97.6% accuracy in Hallucinations (63rd percentile). Instruction Following shows moderate performance at 44.4% accuracy. However, the model struggles significantly with complex analytical tasks, evidenced by low accuracy in Mathematics (17.0%, 14th percentile), Reasoning (36.0%, 24th percentile), and especially Coding (2.0%, 9th percentile). General Knowledge and Ethics benchmarks also show room for improvement, with 93.8% (39th percentile) and 89.0% (23rd percentile) accuracy respectively. Overall, the model excels in classification and avoiding hallucinations, making it suitable for applications requiring precise, non-hallucinatory responses, but it is less effective for tasks demanding advanced mathematical, logical, or coding proficiency.

Model Pricing

Current Pricing

Feature	Price (per 1M tokens)
Prompt	$1.2
Completion	$1.2

Price History

Available Endpoints

Provider	Endpoint Name	Context Length	Pricing (Input)	Pricing (Output)
Lambda	Lambda \| nvidia/llama-3.1-nemotron-70b-instruct	131K	$1.2 / 1M tokens	$1.2 / 1M tokens
DeepInfra	DeepInfra \| nvidia/llama-3.1-nemotron-70b-instruct	131K	$1.2 / 1M tokens	$1.2 / 1M tokens
Together	Together \| nvidia/llama-3.1-nemotron-70b-instruct	32K	$1.2 / 1M tokens	$1.2 / 1M tokens
Infermatic	Infermatic \| nvidia/llama-3.1-nemotron-70b-instruct	32K	$1.2 / 1M tokens	$1.2 / 1M tokens

Benchmark Results

Benchmark	Category	Reasoning	Strategy	Free	Executions	Accuracy	Cost	Duration

Other Models by nvidia

	Released	Params	Context	Filter by Modalities All Modalities	Speed	Ability	Cost
NVIDIA: Nemotron Nano 12B 2 VL	Oct 28, 2025	12B	131K	Image input Video input Text input Text output	★	★★	$$$$
NVIDIA: Llama 3.3 Nemotron Super 49B V1.5	Oct 10, 2025	49B	131K	Text input Text output	★	★★★★	$$$$
NVIDIA: Nemotron Nano 9B V2	Sep 05, 2025	9B	128K	Text input Text output	★	★★	$
NVIDIA: Llama 3.3 Nemotron Super 49B v1 Unavailable	Apr 08, 2025	49B	131K	Text input Text output	★★★	★★	$$
NVIDIA: Llama 3.1 Nemotron Ultra 253B v1	Apr 08, 2025	253B	131K	Text input Text output	★	★★	$$$$$