DeepSeek: R1 Distill Qwen 1.5B

Name: DeepSeek: R1 Distill Qwen 1.5B
Brand: deepseek
Price: 1.8e-7 USD
Availability: InStock
Rating: 1.7 (5 reviews)

Back

Text input Text output

Author's Description

DeepSeek R1 Distill Qwen 1.5B is a distilled large language model based on [Qwen 2.5 Math 1.5B](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It's a very small and efficient model which outperforms [GPT 4o 0513](/openai/gpt-4o-2024-05-13) on Math Benchmarks. Other benchmark results include: - AIME 2024 pass@1: 28.9 - AIME 2024 cons@64: 52.7 - MATH-500 pass@1: 83.9 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

Key Specifications

Cost

$$$

Context

131K

Parameters

Released

Jan 31, 2025

Speed

★★★

Ability

★

Reliability

★★★★★

Hugging Face

Supported Parameters

This model supports the following parameters:

Max Tokens Presence Penalty Frequency Penalty Logit Bias Include Reasoning Response Format Temperature Top P Stop Min P Reasoning

Features

This model supports the following features:

Response Format Reasoning

Performance Summary

DeepSeek R1 Distill Qwen 1.5B demonstrates competitive response times, performing among the faster models with a 40th percentile speed ranking across five benchmarks. It also offers competitive pricing, ranking in the 60th percentile for cost efficiency. While excelling in mathematical reasoning, as evidenced by its impressive AIME 2024 and MATH-500 scores, the model exhibits a mixed performance across other baseline categories. Its primary strength lies in Reasoning, achieving a strong 74% accuracy, placing it in the 74th percentile. This aligns with its mathematical prowess, suggesting a robust logical processing capability. However, the model shows significant weaknesses in other areas. Its accuracy is notably low in Ethics (14th percentile), Coding (20th percentile), Email Classification (6th percentile), and General Knowledge (18th percentile). This indicates a limited breadth of knowledge and understanding in these domains compared to its strong reasoning abilities. Despite its efficiency and mathematical strengths, its overall utility for tasks requiring broad factual recall, ethical discernment, or general coding knowledge may be limited.

Model Pricing

Current Pricing

Feature	Price (per 1M tokens)
Prompt	$0.18
Completion	$0.18

Price History

Available Endpoints

Provider	Endpoint Name	Context Length	Pricing (Input)	Pricing (Output)
Together	Together \| deepseek/deepseek-r1-distill-qwen-1.5b	131K	$0.18 / 1M tokens	$0.18 / 1M tokens

Benchmark Results

Benchmark	Category	Reasoning	Free	Executions	Accuracy	Cost	Duration

Other Models by deepseek

	Released	Params	Context	Filter by Modalities All Modalities	Speed	Ability	Cost
DeepSeek: R1 Distill Qwen 7B	May 30, 2025	7B	131K	Text input Text output	★	★	$$$$
DeepSeek: Deepseek R1 0528 Qwen3 8B	May 29, 2025	8B	131K	Text input Text output	★	★★★★★	$$$
DeepSeek: R1 0528	May 28, 2025	~671B	128K	Text input Text output	★	★★★★★	$$$$$
DeepSeek: DeepSeek Prover V2	Apr 30, 2025	~671B	131K	Text input Text output	★★★★	★★★★★	$$$$
DeepSeek: DeepSeek V3 0324	Mar 24, 2025	~685B	163K	Text input Text output	★★★	★★★★★	$$$
DeepSeek: R1 Distill Llama 8B	Feb 07, 2025	8B	32K	Text input Text output	★	★★★	$$
DeepSeek: R1 Distill Qwen 32B	Jan 29, 2025	32B	131K	Text input Text output	★	★★★★★	$$$
DeepSeek: R1 Distill Qwen 14B	Jan 29, 2025	14B	64K	Text input Text output	★	★★★	$$$
DeepSeek: R1 Distill Llama 70B	Jan 23, 2025	70B	131K	Text input Text output	★	★★★★★	$$$$
DeepSeek: R1	Jan 20, 2025	~671B	128K	Text input Text output	★★	★★★★	$$$$
DeepSeek: DeepSeek V3	Dec 26, 2024	—	163K	Text input Text output	★★★	★★★★	$$$