Author's Description
DeepSeek R1 Distill Qwen 14B is a distilled large language model based on [Qwen 2.5 14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. Other benchmark results include: - AIME 2024 pass@1: 69.7 - MATH-500 pass@1: 93.9 - CodeForces Rating: 1481 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.
Key Specifications
Supported Parameters
This model supports the following parameters:
Features
This model supports the following features:
Performance Summary
DeepSeek R1 Distill Qwen 14B is a highly capable distilled large language model, demonstrating strong performance across various benchmarks. While it tends to exhibit longer response times, ranking in the 9th percentile for speed, it generally offers cost-effective solutions, placing in the 62nd percentile for price. The model showcases exceptional proficiency in Code Generation, achieving 93.0% accuracy in the Coding (Baseline) benchmark, positioning it in the 94th percentile and notably as the most accurate model at its price point. Its Reasoning capabilities are also robust, with an 86.0% accuracy, placing it in the 85th percentile. However, its performance in Ethics (87.5% accuracy), Email Classification (93.0% accuracy), and General Knowledge (77.5% accuracy) falls into the lower quartiles (25th-26th percentile), indicating areas for potential improvement. Overall, DeepSeek R1 Distill Qwen 14B's key strengths lie in its coding and reasoning abilities, making it a strong contender for tasks requiring complex problem-solving and code generation. Its primary weakness is its slower processing speed, which might impact real-time applications. Despite this, its competitive pricing and high accuracy in specific domains make it a valuable option, particularly where cost-efficiency and specialized performance are prioritized over raw speed.
Model Pricing
Current Pricing
Feature | Price (per 1M tokens) |
---|---|
Prompt | $0.15 |
Completion | $0.15 |
Price History
Available Endpoints
Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
---|---|---|---|---|
Novita
|
Novita | deepseek/deepseek-r1-distill-qwen-14b | 64K | $0.15 / 1M tokens | $0.15 / 1M tokens |
GMICloud
|
GMICloud | deepseek/deepseek-r1-distill-qwen-14b | 131K | $0.15 / 1M tokens | $0.15 / 1M tokens |
Together
|
Together | deepseek/deepseek-r1-distill-qwen-14b | 131K | $0.15 / 1M tokens | $0.15 / 1M tokens |
Benchmark Results
Benchmark | Category | Reasoning | Free | Executions | Accuracy | Cost | Duration |
---|
Other Models by deepseek
|
Released | Params | Context |
|
Speed | Ability | Cost |
---|---|---|---|---|---|---|---|
DeepSeek: R1 Distill Qwen 7B | May 30, 2025 | 7B | 131K |
Text input
Text output
|
★ | ★ | $$$$ |
DeepSeek: Deepseek R1 0528 Qwen3 8B | May 29, 2025 | 8B | 131K |
Text input
Text output
|
★ | ★★★★★ | $$$ |
DeepSeek: R1 0528 | May 28, 2025 | ~671B | 128K |
Text input
Text output
|
★ | ★★★★★ | $$$$$ |
DeepSeek: DeepSeek Prover V2 | Apr 30, 2025 | ~671B | 131K |
Text input
Text output
|
★★★★ | ★★★★★ | $$$$ |
DeepSeek: DeepSeek V3 0324 | Mar 24, 2025 | ~685B | 163K |
Text input
Text output
|
★★★ | ★★★★★ | $$$ |
DeepSeek: R1 Distill Llama 8B | Feb 07, 2025 | 8B | 32K |
Text input
Text output
|
★ | ★★★ | $$ |
DeepSeek: R1 Distill Qwen 1.5B | Jan 31, 2025 | 5B | 131K |
Text input
Text output
|
★★★ | ★ | $$$ |
DeepSeek: R1 Distill Qwen 32B | Jan 29, 2025 | 32B | 131K |
Text input
Text output
|
★ | ★★★★★ | $$$ |
DeepSeek: R1 Distill Llama 70B | Jan 23, 2025 | 70B | 131K |
Text input
Text output
|
★ | ★★★★★ | $$$$ |
DeepSeek: R1 | Jan 20, 2025 | ~671B | 128K |
Text input
Text output
|
★★ | ★★★★ | $$$$ |
DeepSeek: DeepSeek V3 | Dec 26, 2024 | — | 163K |
Text input
Text output
|
★★★ | ★★★★ | $$$ |