Author's Description
DeepSeek R1 Distill Llama 8B is a distilled large language model based on [Llama-3.1-8B-Instruct](/meta-llama/llama-3.1-8b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including: - AIME 2024 pass@1: 50.4 - MATH-500 pass@1: 89.1 - CodeForces Rating: 1205 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models. Hugging Face: - [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) - [DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) |
Key Specifications
Supported Parameters
This model supports the following parameters:
Features
This model supports the following features:
Performance Summary
DeepSeek R1 Distill Llama 8B, a distilled large language model based on Llama-3.1-8B-Instruct, demonstrates a compelling balance of performance and cost-efficiency. While its speed ranking places it among the slower models (8th percentile), it consistently offers highly competitive pricing (84th percentile), making it a cost-effective option for various applications. In terms of benchmark performance, the model exhibits notable strengths in specific areas. It achieves a respectable 81.0% accuracy in the "Coding (Baseline)" benchmark, placing it in the 56th percentile, and performs well in "Reasoning (Baseline)" with 66.0% accuracy (62nd percentile). These results, coupled with impressive AIME 2024 pass@1 (50.4) and MATH-500 pass@1 (89.1) scores, highlight its capabilities in complex problem-solving and technical domains. However, the model shows weaknesses in "Ethics (Baseline)" and "Email Classification (Baseline)," with accuracy scores of 74.0% (22nd percentile) and 92.0% (22nd percentile) respectively. Its "General Knowledge (Baseline)" performance is also modest at 72.8% accuracy (25th percentile). Despite these areas for improvement, its overall competitive performance, particularly in coding and reasoning, combined with its cost-effectiveness, positions DeepSeek R1 Distill Llama 8B as a valuable option for applications where budget and specific technical proficiencies are key considerations.
Model Pricing
Current Pricing
Feature | Price (per 1M tokens) |
---|---|
Prompt | $0.04 |
Completion | $0.04 |
Price History
Available Endpoints
Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
---|---|---|---|---|
Novita
|
Novita | deepseek/deepseek-r1-distill-llama-8b | 32K | $0.04 / 1M tokens | $0.04 / 1M tokens |
Benchmark Results
Benchmark | Category | Reasoning | Free | Executions | Accuracy | Cost | Duration |
---|
Other Models by deepseek
|
Released | Params | Context |
|
Speed | Ability | Cost |
---|---|---|---|---|---|---|---|
DeepSeek: R1 Distill Qwen 7B | May 30, 2025 | 7B | 131K |
Text input
Text output
|
★ | ★ | $$$$ |
DeepSeek: Deepseek R1 0528 Qwen3 8B | May 29, 2025 | 8B | 131K |
Text input
Text output
|
★ | ★★★★★ | $$$ |
DeepSeek: R1 0528 | May 28, 2025 | ~671B | 128K |
Text input
Text output
|
★ | ★★★★★ | $$$$$ |
DeepSeek: DeepSeek Prover V2 | Apr 30, 2025 | ~671B | 131K |
Text input
Text output
|
★★★★ | ★★★★★ | $$$$ |
DeepSeek: DeepSeek V3 0324 | Mar 24, 2025 | ~685B | 163K |
Text input
Text output
|
★★★ | ★★★★★ | $$$ |
DeepSeek: R1 Distill Qwen 1.5B | Jan 31, 2025 | 5B | 131K |
Text input
Text output
|
★★★ | ★ | $$$ |
DeepSeek: R1 Distill Qwen 32B | Jan 29, 2025 | 32B | 131K |
Text input
Text output
|
★ | ★★★★★ | $$$ |
DeepSeek: R1 Distill Qwen 14B | Jan 29, 2025 | 14B | 64K |
Text input
Text output
|
★ | ★★★ | $$$ |
DeepSeek: R1 Distill Llama 70B | Jan 23, 2025 | 70B | 131K |
Text input
Text output
|
★ | ★★★★★ | $$$$ |
DeepSeek: R1 | Jan 20, 2025 | ~671B | 128K |
Text input
Text output
|
★★ | ★★★★ | $$$$ |
DeepSeek: DeepSeek V3 | Dec 26, 2024 | — | 163K |
Text input
Text output
|
★★★ | ★★★★ | $$$ |