DeepSeek: R1 Distill Qwen 7B

Text input Text output Unavailable
Author's Description

DeepSeek-R1-Distill-Qwen-7B is a 7 billion parameter dense language model distilled from DeepSeek-R1, leveraging reinforcement learning-enhanced reasoning data generated by DeepSeek's larger models. The distillation process transfers advanced reasoning, math, and code capabilities into a smaller, more efficient model architecture based on Qwen2.5-Math-7B. This model demonstrates strong performance across mathematical benchmarks (92.8% pass@1 on MATH-500), coding tasks (Codeforces rating 1189), and general reasoning (49.1% pass@1 on GPQA Diamond), achieving competitive accuracy relative to larger models while maintaining smaller inference costs.

Key Specifications
Cost
$$$$
Context
131K
Parameters
7B
Released
May 30, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Top P Seed Max Tokens Reasoning Include Reasoning Temperature
Features

This model supports the following features:

Reasoning
Performance Summary

DeepSeek-R1-Distill-Qwen-7B demonstrates exceptional performance in terms of operational efficiency, consistently ranking among the fastest models and offering highly competitive pricing across various benchmarks. This positions it as a cost-effective solution for a wide range of applications. However, the model exhibits significant variability in its accuracy across different task categories. While the description highlights strong capabilities in mathematical benchmarks (92.8% pass@1 on MATH-500), coding tasks (Codeforces rating 1189), and general reasoning (49.1% pass@1 on GPQA Diamond), the provided baseline benchmark results show a concerning lack of performance in several key areas. Specifically, the model achieved 0.0% accuracy in Ethics, Email Classification, Reasoning, and General Knowledge benchmarks, suggesting a potential limitation in handling these specific types of tasks or an issue with the baseline evaluation methodology for these categories. Its performance in Instruction Following (34.3% accuracy) and Coding (66.0% accuracy) is moderate, falling within the 20th-51st percentile range for accuracy and duration. The model's strength appears to lie in its efficiency and cost-effectiveness, making it suitable for applications where these factors are paramount, provided the specific task aligns with its demonstrated capabilities in math and code as per the model description, rather than the baseline benchmarks.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.1
Completion $0.2

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
GMICloud
GMICloud | deepseek/deepseek-r1-distill-qwen-7b 131K $0.1 / 1M tokens $0.2 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by deepseek