Author's Description
DeepSeek R1 Distill Qwen 1.5B is a distilled large language model based on [Qwen 2.5 Math 1.5B](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It's a very small and efficient model which outperforms [GPT 4o 0513](/openai/gpt-4o-2024-05-13) on Math Benchmarks. Other benchmark results include: - AIME 2024 pass@1: 28.9 - AIME 2024 cons@64: 52.7 - MATH-500 pass@1: 83.9 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.
Key Specifications
Supported Parameters
This model supports the following parameters:
Features
This model supports the following features:
Performance Summary
DeepSeek R1 Distill Qwen 1.5B demonstrates competitive response times, performing among the faster models with a 40th percentile speed ranking across five benchmarks. It also offers competitive pricing, ranking in the 60th percentile for cost efficiency. While excelling in mathematical reasoning, as evidenced by its impressive AIME 2024 and MATH-500 scores, the model exhibits a mixed performance across other baseline categories. Its primary strength lies in Reasoning, achieving a strong 74% accuracy, placing it in the 74th percentile. This aligns with its mathematical prowess, suggesting a robust logical processing capability. However, the model shows significant weaknesses in other areas. Its accuracy is notably low in Ethics (14th percentile), Coding (20th percentile), Email Classification (6th percentile), and General Knowledge (18th percentile). This indicates a limited breadth of knowledge and understanding in these domains compared to its strong reasoning abilities. Despite its efficiency and mathematical strengths, its overall utility for tasks requiring broad factual recall, ethical discernment, or general coding knowledge may be limited.
Model Pricing
Current Pricing
Feature | Price (per 1M tokens) |
---|---|
Prompt | $0.18 |
Completion | $0.18 |
Price History
Available Endpoints
Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
---|---|---|---|---|
Together
|
Together | deepseek/deepseek-r1-distill-qwen-1.5b | 131K | $0.18 / 1M tokens | $0.18 / 1M tokens |
Benchmark Results
Benchmark | Category | Reasoning | Free | Executions | Accuracy | Cost | Duration |
---|
Other Models by deepseek
|
Released | Params | Context |
|
Speed | Ability | Cost |
---|---|---|---|---|---|---|---|
DeepSeek: R1 Distill Qwen 7B | May 30, 2025 | 7B | 131K |
Text input
Text output
|
★ | ★ | $$$$ |
DeepSeek: Deepseek R1 0528 Qwen3 8B | May 29, 2025 | 8B | 131K |
Text input
Text output
|
★ | ★★★★★ | $$$ |
DeepSeek: R1 0528 | May 28, 2025 | ~671B | 128K |
Text input
Text output
|
★ | ★★★★★ | $$$$$ |
DeepSeek: DeepSeek Prover V2 | Apr 30, 2025 | ~671B | 131K |
Text input
Text output
|
★★★★ | ★★★★★ | $$$$ |
DeepSeek: DeepSeek V3 0324 | Mar 24, 2025 | ~685B | 163K |
Text input
Text output
|
★★★ | ★★★★★ | $$$ |
DeepSeek: R1 Distill Llama 8B | Feb 07, 2025 | 8B | 32K |
Text input
Text output
|
★ | ★★★ | $$ |
DeepSeek: R1 Distill Qwen 32B | Jan 29, 2025 | 32B | 131K |
Text input
Text output
|
★ | ★★★★★ | $$$ |
DeepSeek: R1 Distill Qwen 14B | Jan 29, 2025 | 14B | 64K |
Text input
Text output
|
★ | ★★★ | $$$ |
DeepSeek: R1 Distill Llama 70B | Jan 23, 2025 | 70B | 131K |
Text input
Text output
|
★ | ★★★★★ | $$$$ |
DeepSeek: R1 | Jan 20, 2025 | ~671B | 128K |
Text input
Text output
|
★★ | ★★★★ | $$$$ |
DeepSeek: DeepSeek V3 | Dec 26, 2024 | — | 163K |
Text input
Text output
|
★★★ | ★★★★ | $$$ |