DeepSeek: R1 Distill Llama 70B

Name: DeepSeek: R1 Distill Llama 70B
Brand: deepseek
Price: 6e-7 USD
Availability: InStock
Rating: 4.4 (8 reviews)

Back

Text input Text output

Author's Description

DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including: - AIME 2024 pass@1: 70.0 - MATH-500 pass@1: 94.5 - CodeForces Rating: 1633 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

Key Specifications

Cost

Context

131K

Parameters

70B

Released

Jan 23, 2025

Speed

★★★

Ability

★★★★★

Reliability

★★★★

Hugging Face

Supported Parameters

This model supports the following parameters:

Stop Reasoning Max Tokens Temperature Min P Top P Response Format Frequency Penalty Presence Penalty Seed Include Reasoning

Features

This model supports the following features:

Response Format Reasoning

Performance Summary

DeepSeek R1 Distill Llama 70B demonstrates a strong performance profile, particularly in its reliability and specific academic benchmarks. While its speed ranking places it in the 18th percentile, indicating generally longer response times, it offers competitive pricing, ranking in the 47th percentile. A standout feature is its exceptional reliability, achieving a 97% success rate across benchmarks, suggesting consistent and usable outputs. The model excels in specialized areas, achieving a remarkable 70.0 pass@1 on AIME 2024 and 94.5 pass@1 on MATH-500, alongside a CodeForces Rating of 1633. Benchmark results show high accuracy in General Knowledge (99.8%) and Reasoning (84.0%). It also performs well in Ethics (99.0%) and Coding (87.0%). A notable strength is its perfect 100.0% accuracy in one Instruction Following benchmark, achieved with a fast duration. However, its Hallucinations accuracy (90.0%) is moderate, and its Mathematics benchmark score (79.0%) is in the lower half of models. The model's primary weakness appears to be its speed, consistently ranking in lower percentiles for duration across most benchmarks.

Model Pricing

Current Pricing

Feature	Price (per 1M tokens)
Prompt	$0.6
Completion	$1.2

Price History

Available Endpoints

Provider	Endpoint Name	Context Length	Pricing (Input)	Pricing (Output)
DeepInfra	DeepInfra \| deepseek/deepseek-r1-distill-llama-70b	131K	$0.6 / 1M tokens	$1.2 / 1M tokens
InferenceNet	InferenceNet \| deepseek/deepseek-r1-distill-llama-70b	128K	$0.03 / 1M tokens	$0.13 / 1M tokens
Lambda	Lambda \| deepseek/deepseek-r1-distill-llama-70b	131K	$0.03 / 1M tokens	$0.13 / 1M tokens
Phala	Phala \| deepseek/deepseek-r1-distill-llama-70b	131K	$0.03 / 1M tokens	$0.13 / 1M tokens
GMICloud	GMICloud \| deepseek/deepseek-r1-distill-llama-70b	131K	$0.03 / 1M tokens	$0.13 / 1M tokens
Nebius	Nebius \| deepseek/deepseek-r1-distill-llama-70b	131K	$0.03 / 1M tokens	$0.13 / 1M tokens
SambaNova	SambaNova \| deepseek/deepseek-r1-distill-llama-70b	131K	$0.7 / 1M tokens	$1.4 / 1M tokens
Groq	Groq \| deepseek/deepseek-r1-distill-llama-70b	131K	$0.03 / 1M tokens	$0.13 / 1M tokens
Novita	Novita \| deepseek/deepseek-r1-distill-llama-70b	32K	$0.03 / 1M tokens	$0.13 / 1M tokens
Together	Together \| deepseek/deepseek-r1-distill-llama-70b	131K	$2 / 1M tokens	$2 / 1M tokens
Cerebras	Cerebras \| deepseek/deepseek-r1-distill-llama-70b	32K	$0.03 / 1M tokens	$0.13 / 1M tokens
Chutes	Chutes \| deepseek/deepseek-r1-distill-llama-70b	131K	$0.03 / 1M tokens	$0.13 / 1M tokens
Chutes	Chutes \| deepseek/deepseek-r1-distill-llama-70b	131K	$0.03 / 1M tokens	$0.13 / 1M tokens
Novita	Novita \| deepseek/deepseek-r1-distill-llama-70b	8K	$0.64 / 1M tokens	$0.64 / 1M tokens

Benchmark Results

Benchmark	Category	Reasoning	Strategy	Free	Executions	Accuracy	Cost	Duration

Other Models by deepseek

	Released	Params	Context	Filter by Modalities All Modalities	Speed	Ability	Cost
DeepSeek: DeepSeek V3.2 Speciale	Dec 01, 2025	—	131K	Text input Text output	★	★★★★★	$$$$
DeepSeek: DeepSeek V3.2	Dec 01, 2025	—	131K	Text input Text output	—	—	$$$
DeepSeek: DeepSeek V3.2 Exp	Sep 29, 2025	—	131K	Text input Text output	★★★	★★★★★	$$$
DeepSeek: DeepSeek V3.1 Terminus	Sep 22, 2025	~671B	131K	Text input Text output	★★★★	★★★★★	$$$$
DeepSeek: DeepSeek V3.1 Terminus (exacto)	Sep 22, 2025	~671B	131K	Text input Text output	—	—	$$$
DeepSeek: DeepSeek V3.1	Aug 21, 2025	~671B	131K	Text input Text output	★★	★★★★	$$$
DeepSeek: DeepSeek V3.1 Base Unavailable	Aug 20, 2025	~671B	163K	Text input Text output	★	★	$$
DeepSeek: R1 Distill Qwen 7B Unavailable	May 30, 2025	7B	131K	Text input Text output	★	★	$$$$
DeepSeek: DeepSeek R1 0528 Qwen3 8B	May 29, 2025	8B	131K	Text input Text output	★★★	★★★	$$
DeepSeek: R1 0528	May 28, 2025	~671B	128K	Text input Text output	★★★	★★★	$$$
DeepSeek: DeepSeek Prover V2	Apr 30, 2025	~671B	131K	Text input Text output	★★	★★★★	$$$$
DeepSeek: DeepSeek V3 Base Unavailable	Mar 29, 2025	~671B	163K	Text input Text output	★	★	$$$
DeepSeek: DeepSeek V3 0324	Mar 24, 2025	~685B	163K	Text input Text output	★★★★	★★★★★	$$
DeepSeek: R1 Distill Llama 8B Unavailable	Feb 07, 2025	8B	32K	Text input Text output	★	★★	$$
DeepSeek: R1 Distill Qwen 1.5B Unavailable	Jan 31, 2025	5B	131K	Text input Text output	★★★	★	$$$
DeepSeek: R1 Distill Qwen 32B	Jan 29, 2025	32B	131K	Text input Text output	★	★★★★	$$$
DeepSeek: R1 Distill Qwen 14B	Jan 29, 2025	14B	32K	Text input Text output	★	★★	$$$
DeepSeek: R1	Jan 20, 2025	~671B	128K	Text input Text output	★★★	★★★★	$$$
DeepSeek: DeepSeek V3	Dec 26, 2024	—	163K	Text input Text output	★★★	★★★★	$$$