DeepSeek: R1 Distill Qwen 7B

Text input Text output
Author's Description

DeepSeek-R1-Distill-Qwen-7B is a 7 billion parameter dense language model distilled from DeepSeek-R1, leveraging reinforcement learning-enhanced reasoning data generated by DeepSeek's larger models. The distillation process transfers advanced reasoning, math, and code capabilities into a smaller, more efficient model architecture based on Qwen2.5-Math-7B. This model demonstrates strong performance across mathematical benchmarks (92.8% pass@1 on MATH-500), coding tasks (Codeforces rating 1189), and general reasoning (49.1% pass@1 on GPQA Diamond), achieving competitive accuracy relative to larger models while maintaining smaller inference costs.

Key Specifications
Cost
$$$$
Context
131K
Parameters
7B
Released
May 30, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Seed Max Tokens Include Reasoning Temperature Top P Reasoning
Features

This model supports the following features:

Reasoning
Performance Summary

DeepSeek-R1-Distill-Qwen-7B demonstrates exceptional performance in terms of operational efficiency, consistently ranking among the fastest models and offering highly competitive pricing across various benchmarks. This 7 billion parameter model, distilled from DeepSeek-R1, leverages advanced reasoning data to achieve a strong balance of capability and cost-effectiveness. While the model's description highlights impressive capabilities in mathematical benchmarks (92.8% pass@1 on MATH-500), coding tasks (Codeforces rating 1189), and general reasoning (49.1% pass@1 on GPQA Diamond), the provided baseline benchmark results present a mixed picture. It exhibits a notable strength in the "Coding (Baseline)" category, achieving 66.0% accuracy, placing it in the 32nd percentile, with efficient cost and duration. However, the model shows significant weaknesses in "Ethics," "Email Classification," "Reasoning," and "General Knowledge" baseline benchmarks, where it recorded 0.0% accuracy. This discrepancy suggests that while the model possesses strong specialized capabilities, its performance on these specific baseline evaluations indicates areas for improvement in broader ethical understanding, classification tasks, general reasoning, and knowledge recall. Further investigation into the nature of these baseline tests versus the model's described strengths would be beneficial.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.1
Completion $0.2

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
GMICloud
GMICloud | deepseek/deepseek-r1-distill-qwen-7b 131K $0.1 / 1M tokens $0.2 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by deepseek