DeepSeek: R1 Distill Qwen 1.5B

Text input Text output
Author's Description

DeepSeek R1 Distill Qwen 1.5B is a distilled large language model based on [Qwen 2.5 Math 1.5B](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It's a very small and efficient model which outperforms [GPT 4o 0513](/openai/gpt-4o-2024-05-13) on Math Benchmarks. Other benchmark results include: - AIME 2024 pass@1: 28.9 - AIME 2024 cons@64: 52.7 - MATH-500 pass@1: 83.9 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

Key Specifications
Cost
$$$
Context
131K
Parameters
5B
Released
Jan 31, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Max Tokens Presence Penalty Frequency Penalty Logit Bias Include Reasoning Response Format Temperature Top P Stop Min P Reasoning
Features

This model supports the following features:

Response Format Reasoning
Performance Summary

DeepSeek R1 Distill Qwen 1.5B demonstrates competitive response times, performing among the faster models with a 40th percentile speed ranking across five benchmarks. It also offers competitive pricing, ranking in the 60th percentile for cost efficiency. While excelling in mathematical reasoning, as evidenced by its impressive AIME 2024 and MATH-500 scores, the model exhibits a mixed performance across other baseline categories. Its primary strength lies in Reasoning, achieving a strong 74% accuracy, placing it in the 74th percentile. This aligns with its mathematical prowess, suggesting a robust logical processing capability. However, the model shows significant weaknesses in other areas. Its accuracy is notably low in Ethics (14th percentile), Coding (20th percentile), Email Classification (6th percentile), and General Knowledge (18th percentile). This indicates a limited breadth of knowledge and understanding in these domains compared to its strong reasoning abilities. Despite its efficiency and mathematical strengths, its overall utility for tasks requiring broad factual recall, ethical discernment, or general coding knowledge may be limited.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.18
Completion $0.18

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Together
Together | deepseek/deepseek-r1-distill-qwen-1.5b 131K $0.18 / 1M tokens $0.18 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by deepseek