DeepSeek: Deepseek R1 0528 Qwen3 8B

Text input Text output Free Option
Author's Description

DeepSeek-R1-0528 is a lightly upgraded release of DeepSeek R1 that taps more compute and smarter post-training tricks, pushing its reasoning and inference to the brink of flagship models like O3 and Gemini 2.5 Pro. It now tops math, programming, and logic leaderboards, showcasing a step-change in depth-of-thought. The distilled variant, DeepSeek-R1-0528-Qwen3-8B, transfers this chain-of-thought into an 8 B-parameter form, beating standard Qwen3 8B by +10 pp and tying the 235 B “thinking” giant on AIME 2024.

Key Specifications
Cost
$$$
Context
131K
Parameters
8B
Released
May 29, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Seed Max Tokens Presence Penalty Frequency Penalty Include Reasoning Temperature Top P Min P Stop Reasoning
Features

This model supports the following features:

Reasoning
Performance Summary

DeepSeek-R1-0528-Qwen3-8B demonstrates a compelling blend of advanced capabilities within a compact 8B parameter footprint. While its speed performance places it in the 17th percentile, indicating longer response times compared to many models, it generally offers cost-effective solutions, ranking in the 62nd percentile for price. The model excels in complex reasoning and general knowledge, achieving 98.0% accuracy in Reasoning (96th percentile) and a perfect 100.0% in General Knowledge. Notably, it is the most accurate model at its price point for both these categories, and among the top three in accuracy for General Knowledge. Its performance in Coding is also strong at 89.0% accuracy (80th percentile). However, its accuracy in Ethics (98.0%, 47th percentile) and Email Classification (95.0%, 36th percentile) is more moderate, suggesting areas for potential refinement. Overall, DeepSeek-R1-0528-Qwen3-8B's key strength lies in its exceptional reasoning and knowledge acquisition, particularly impressive given its distilled nature. Its primary weakness is its slower processing speed. This model is well-suited for applications prioritizing accuracy in complex problem-solving and knowledge retrieval, where response time is a secondary concern.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.01
Completion $0.02

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Parasail
Parasail | deepseek/deepseek-r1-0528-qwen3-8b 131K $0.01 / 1M tokens $0.02 / 1M tokens
Novita
Novita | deepseek/deepseek-r1-0528-qwen3-8b 128K $0.01 / 1M tokens $0.02 / 1M tokens
Nineteen
Nineteen | deepseek/deepseek-r1-0528-qwen3-8b 32K $0.01 / 1M tokens $0.02 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by deepseek