Author's Description
DeepSeek-R1-Distill-Qwen-7B is a 7 billion parameter dense language model distilled from DeepSeek-R1, leveraging reinforcement learning-enhanced reasoning data generated by DeepSeek's larger models. The distillation process transfers advanced reasoning, math, and code capabilities into a smaller, more efficient model architecture based on Qwen2.5-Math-7B. This model demonstrates strong performance across mathematical benchmarks (92.8% pass@1 on MATH-500), coding tasks (Codeforces rating 1189), and general reasoning (49.1% pass@1 on GPQA Diamond), achieving competitive accuracy relative to larger models while maintaining smaller inference costs.
Key Specifications
Supported Parameters
This model supports the following parameters:
Features
This model supports the following features:
Performance Summary
DeepSeek-R1-Distill-Qwen-7B demonstrates exceptional performance in terms of operational efficiency, consistently ranking among the fastest models and offering highly competitive pricing across various benchmarks. This 7 billion parameter model, distilled from DeepSeek-R1, leverages advanced reasoning data to achieve a strong balance of capability and cost-effectiveness. While the model's description highlights impressive capabilities in mathematical benchmarks (92.8% pass@1 on MATH-500), coding tasks (Codeforces rating 1189), and general reasoning (49.1% pass@1 on GPQA Diamond), the provided baseline benchmark results present a mixed picture. It exhibits a notable strength in the "Coding (Baseline)" category, achieving 66.0% accuracy, placing it in the 32nd percentile, with efficient cost and duration. However, the model shows significant weaknesses in "Ethics," "Email Classification," "Reasoning," and "General Knowledge" baseline benchmarks, where it recorded 0.0% accuracy. This discrepancy suggests that while the model possesses strong specialized capabilities, its performance on these specific baseline evaluations indicates areas for improvement in broader ethical understanding, classification tasks, general reasoning, and knowledge recall. Further investigation into the nature of these baseline tests versus the model's described strengths would be beneficial.
Model Pricing
Current Pricing
Feature | Price (per 1M tokens) |
---|---|
Prompt | $0.1 |
Completion | $0.2 |
Price History
Available Endpoints
Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
---|---|---|---|---|
GMICloud
|
GMICloud | deepseek/deepseek-r1-distill-qwen-7b | 131K | $0.1 / 1M tokens | $0.2 / 1M tokens |
Benchmark Results
Benchmark | Category | Reasoning | Free | Executions | Accuracy | Cost | Duration |
---|
Other Models by deepseek
|
Released | Params | Context |
|
Speed | Ability | Cost |
---|---|---|---|---|---|---|---|
DeepSeek: Deepseek R1 0528 Qwen3 8B | May 29, 2025 | 8B | 131K |
Text input
Text output
|
★ | ★★★★★ | $$$ |
DeepSeek: R1 0528 | May 28, 2025 | ~671B | 128K |
Text input
Text output
|
★ | ★★★★★ | $$$$$ |
DeepSeek: DeepSeek Prover V2 | Apr 30, 2025 | ~671B | 131K |
Text input
Text output
|
★★★★ | ★★★★★ | $$$$ |
DeepSeek: DeepSeek V3 0324 | Mar 24, 2025 | ~685B | 163K |
Text input
Text output
|
★★★ | ★★★★★ | $$$ |
DeepSeek: R1 Distill Llama 8B | Feb 07, 2025 | 8B | 32K |
Text input
Text output
|
★ | ★★★ | $$ |
DeepSeek: R1 Distill Qwen 1.5B | Jan 31, 2025 | 5B | 131K |
Text input
Text output
|
★★★ | ★ | $$$ |
DeepSeek: R1 Distill Qwen 32B | Jan 29, 2025 | 32B | 131K |
Text input
Text output
|
★ | ★★★★★ | $$$ |
DeepSeek: R1 Distill Qwen 14B | Jan 29, 2025 | 14B | 64K |
Text input
Text output
|
★ | ★★★ | $$$ |
DeepSeek: R1 Distill Llama 70B | Jan 23, 2025 | 70B | 131K |
Text input
Text output
|
★ | ★★★★★ | $$$$ |
DeepSeek: R1 | Jan 20, 2025 | ~671B | 128K |
Text input
Text output
|
★★ | ★★★★ | $$$$ |
DeepSeek: DeepSeek V3 | Dec 26, 2024 | — | 163K |
Text input
Text output
|
★★★ | ★★★★ | $$$ |