DeepSeek: R1 Distill Qwen 1.5B

Text input Text output Unavailable
Author's Description

DeepSeek R1 Distill Qwen 1.5B is a distilled large language model based on [Qwen 2.5 Math 1.5B](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It's a very small and efficient model which outperforms [GPT 4o 0513](/openai/gpt-4o-2024-05-13) on Math Benchmarks. Other benchmark results include: - AIME 2024 pass@1: 28.9 - AIME 2024 cons@64: 52.7 - MATH-500 pass@1: 83.9 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

Key Specifications
Cost
$$$
Context
131K
Parameters
5B
Released
Jan 31, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Stop Reasoning Max Tokens Temperature Min P Top P Frequency Penalty Presence Penalty Logit Bias Include Reasoning
Features

This model supports the following features:

Reasoning
Performance Summary

DeepSeek R1 Distill Qwen 1.5B, a distilled model based on Qwen 2.5 Math 1.5B and fine-tuned with DeepSeek R1 outputs, demonstrates a moderate speed performance, ranking in the 39th percentile across five benchmarks. It offers competitive pricing, placing in the 52nd percentile. Despite its small size, the model exhibits exceptional performance on Math Benchmarks, notably outperforming GPT-4o 0513. Specific math benchmark results are impressive: AIME 2024 pass@1 at 28.9, AIME 2024 cons@64 at 52.7, and MATH-500 pass@1 at 83.9. This highlights a significant strength in mathematical reasoning and problem-solving. However, its performance across general benchmarks is considerably lower. It struggles with Instruction Following (12.1% accuracy, 24th percentile), Coding (21.0% accuracy, 17th percentile), General Knowledge (45.1% accuracy, 16th percentile), Email Classification (29.0% accuracy, 5th percentile), and Ethics (34.0% accuracy, 13th percentile). These results indicate a notable weakness in broader cognitive tasks and general domain understanding. The model's primary strength lies in its specialized mathematical capabilities, achieved with remarkable efficiency for its size.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.18
Completion $0.18

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Together
Together | deepseek/deepseek-r1-distill-qwen-1.5b 131K $0.18 / 1M tokens $0.18 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by deepseek