Author's Description
DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including: - AIME 2024 pass@1: 70.0 - MATH-500 pass@1: 94.5 - CodeForces Rating: 1633 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.
Key Specifications
Supported Parameters
This model supports the following parameters:
Features
This model supports the following features:
Performance Summary
DeepSeek R1 Distill Llama 70B demonstrates a strong performance profile, particularly in its reliability and specific academic benchmarks. While its speed ranking places it in the 18th percentile, indicating generally longer response times, it offers competitive pricing, ranking in the 47th percentile. A standout feature is its exceptional reliability, achieving a 97% success rate across benchmarks, suggesting consistent and usable outputs. The model excels in specialized areas, achieving a remarkable 70.0 pass@1 on AIME 2024 and 94.5 pass@1 on MATH-500, alongside a CodeForces Rating of 1633. Benchmark results show high accuracy in General Knowledge (99.8%) and Reasoning (84.0%). It also performs well in Ethics (99.0%) and Coding (87.0%). A notable strength is its perfect 100.0% accuracy in one Instruction Following benchmark, achieved with a fast duration. However, its Hallucinations accuracy (90.0%) is moderate, and its Mathematics benchmark score (79.0%) is in the lower half of models. The model's primary weakness appears to be its speed, consistently ranking in lower percentiles for duration across most benchmarks.
Model Pricing
Current Pricing
Feature | Price (per 1M tokens) |
---|---|
Prompt | $0.5 |
Completion | $1 |
Price History
Available Endpoints
Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
---|---|---|---|---|
DeepInfra
|
DeepInfra | deepseek/deepseek-r1-distill-llama-70b | 131K | $0.5 / 1M tokens | $1 / 1M tokens |
InferenceNet
|
InferenceNet | deepseek/deepseek-r1-distill-llama-70b | 128K | $0.03 / 1M tokens | $0.13 / 1M tokens |
Lambda
|
Lambda | deepseek/deepseek-r1-distill-llama-70b | 131K | $0.03 / 1M tokens | $0.13 / 1M tokens |
Phala
|
Phala | deepseek/deepseek-r1-distill-llama-70b | 131K | $0.03 / 1M tokens | $0.13 / 1M tokens |
GMICloud
|
GMICloud | deepseek/deepseek-r1-distill-llama-70b | 131K | $0.03 / 1M tokens | $0.13 / 1M tokens |
Nebius
|
Nebius | deepseek/deepseek-r1-distill-llama-70b | 131K | $0.03 / 1M tokens | $0.13 / 1M tokens |
SambaNova
|
SambaNova | deepseek/deepseek-r1-distill-llama-70b | 131K | $0.7 / 1M tokens | $1.4 / 1M tokens |
Groq
|
Groq | deepseek/deepseek-r1-distill-llama-70b | 131K | $0.03 / 1M tokens | $0.13 / 1M tokens |
Novita
|
Novita | deepseek/deepseek-r1-distill-llama-70b | 32K | $0.03 / 1M tokens | $0.13 / 1M tokens |
Together
|
Together | deepseek/deepseek-r1-distill-llama-70b | 131K | $2 / 1M tokens | $2 / 1M tokens |
Cerebras
|
Cerebras | deepseek/deepseek-r1-distill-llama-70b | 32K | $0.03 / 1M tokens | $0.13 / 1M tokens |
Chutes
|
Chutes | deepseek/deepseek-r1-distill-llama-70b | 131K | $0.03 / 1M tokens | $0.13 / 1M tokens |
Chutes
|
Chutes | deepseek/deepseek-r1-distill-llama-70b | 131K | $0.03 / 1M tokens | $0.13 / 1M tokens |
Novita
|
Novita | deepseek/deepseek-r1-distill-llama-70b | 32K | $0.8 / 1M tokens | $0.8 / 1M tokens |
Benchmark Results
Benchmark | Category | Reasoning | Strategy | Free | Executions | Accuracy | Cost | Duration |
---|
Other Models by deepseek
|
Released | Params | Context |
|
Speed | Ability | Cost |
---|---|---|---|---|---|---|---|
DeepSeek: DeepSeek V3.2 Exp | Sep 29, 2025 | — | 131K |
Text input
Text output
|
★★★ | ★★★★★ | $$$ |
DeepSeek: DeepSeek V3.1 Terminus | Sep 22, 2025 | ~671B | 131K |
Text input
Text output
|
★★★★ | ★★★★★ | $$$$ |
DeepSeek: DeepSeek V3.1 | Aug 21, 2025 | ~671B | 131K |
Text input
Text output
|
★★ | ★★★★ | $$$ |
DeepSeek: DeepSeek V3.1 Base Unavailable | Aug 20, 2025 | ~671B | 163K |
Text input
Text output
|
★ | ★ | $$ |
DeepSeek: R1 Distill Qwen 7B Unavailable | May 30, 2025 | 7B | 131K |
Text input
Text output
|
★ | ★ | $$$$ |
DeepSeek: DeepSeek R1 0528 Qwen3 8B | May 29, 2025 | 8B | 131K |
Text input
Text output
|
★★★ | ★★★ | $$ |
DeepSeek: R1 0528 | May 28, 2025 | ~671B | 128K |
Text input
Text output
|
★★★ | ★★★ | $$$ |
DeepSeek: DeepSeek Prover V2 | Apr 30, 2025 | ~671B | 131K |
Text input
Text output
|
★★ | ★★★★ | $$$$ |
DeepSeek: DeepSeek V3 Base Unavailable | Mar 29, 2025 | ~671B | 163K |
Text input
Text output
|
★ | ★ | $$$ |
DeepSeek: DeepSeek V3 0324 | Mar 24, 2025 | ~685B | 163K |
Text input
Text output
|
★★★★ | ★★★★★ | $$ |
DeepSeek: R1 Distill Llama 8B Unavailable | Feb 07, 2025 | 8B | 32K |
Text input
Text output
|
★ | ★★ | $$ |
DeepSeek: R1 Distill Qwen 1.5B Unavailable | Jan 31, 2025 | 5B | 131K |
Text input
Text output
|
★★★ | ★ | $$$ |
DeepSeek: R1 Distill Qwen 32B | Jan 29, 2025 | 32B | 131K |
Text input
Text output
|
★ | ★★★★ | $$$ |
DeepSeek: R1 Distill Qwen 14B | Jan 29, 2025 | 14B | 32K |
Text input
Text output
|
★ | ★★ | $$$ |
DeepSeek: R1 | Jan 20, 2025 | ~671B | 128K |
Text input
Text output
|
★★★ | ★★★★ | $$$ |
DeepSeek: DeepSeek V3 | Dec 26, 2024 | — | 163K |
Text input
Text output
|
★★★ | ★★★★ | $$$ |