Author's Description
Note that this is a base model mostly meant for testing, you need to provide detailed prompts for the model to return useful responses. DeepSeek-V3 Base is a 671B parameter open Mixture-of-Experts (MoE) language model with 37B active parameters per forward pass and a context length of 128K tokens. Trained on 14.8T tokens using FP8 mixed precision, it achieves high training efficiency and stability, with strong performance across language, reasoning, math, and coding tasks. DeepSeek-V3 Base is the pre-trained model behind [DeepSeek V3](/deepseek/deepseek-chat-v3)
Key Specifications
Supported Parameters
This model supports the following parameters:
Performance Summary
DeepSeek V3 Base, a 671B parameter MoE model with a 128K token context length, consistently ranks among the fastest models available, demonstrating exceptional speed across all evaluated benchmarks. It also offers highly competitive pricing, positioning it as one of the most cost-effective options. However, its performance across various benchmark categories indicates significant limitations in accuracy. The model achieved very low accuracy scores, with 24.1% in Coding, 17.6% in General Knowledge, 13.0% in Ethics, and 10.0% in Email Classification. It scored 0.0% in both Instruction Following and Reasoning, suggesting a fundamental inability to handle these types of tasks in its current base form. While its speed and cost efficiency are notable strengths, the extremely low accuracy across all tested domains, particularly in critical areas like reasoning and instruction following, represents a significant weakness. This model appears to be primarily suited for testing and development, requiring detailed prompting as noted in its description, rather than for general-purpose application where high accuracy is paramount.
Model Pricing
Current Pricing
Feature | Price (per 1M tokens) |
---|---|
Prompt | $0.2 |
Completion | $0.8 |
Price History
Available Endpoints
Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
---|---|---|---|---|
Chutes
|
Chutes | deepseek/deepseek-v3-base | 163K | $0.2 / 1M tokens | $0.8 / 1M tokens |
Benchmark Results
Benchmark | Category | Reasoning | Free | Executions | Accuracy | Cost | Duration |
---|
Other Models by deepseek
|
Released | Params | Context |
|
Speed | Ability | Cost |
---|---|---|---|---|---|---|---|
DeepSeek: R1 Distill Qwen 7B Unavailable | May 30, 2025 | 7B | 131K |
Text input
Text output
|
★ | ★ | $$$$ |
DeepSeek: Deepseek R1 0528 Qwen3 8B | May 29, 2025 | 8B | 131K |
Text input
Text output
|
★★★ | ★★★ | $$ |
DeepSeek: R1 0528 | May 28, 2025 | ~671B | 128K |
Text input
Text output
|
★★★ | ★★★ | $$$ |
DeepSeek: DeepSeek Prover V2 | Apr 30, 2025 | ~671B | 131K |
Text input
Text output
|
★★ | ★★★★★ | $$$$ |
DeepSeek: DeepSeek V3 0324 | Mar 24, 2025 | ~685B | 163K |
Text input
Text output
|
★★★★ | ★★★★★ | $$ |
DeepSeek: R1 Distill Llama 8B | Feb 07, 2025 | 8B | 32K |
Text input
Text output
|
★ | ★★ | $$ |
DeepSeek: R1 Distill Qwen 1.5B | Jan 31, 2025 | 5B | 131K |
Text input
Text output
|
★★★ | ★ | $$$ |
DeepSeek: R1 Distill Qwen 32B | Jan 29, 2025 | 32B | 131K |
Text input
Text output
|
★ | ★★★★★ | $$$ |
DeepSeek: R1 Distill Qwen 14B | Jan 29, 2025 | 14B | 64K |
Text input
Text output
|
★ | ★★ | $$$ |
DeepSeek: R1 Distill Llama 70B | Jan 23, 2025 | 70B | 131K |
Text input
Text output
|
★★★ | ★★★★★ | $$ |
DeepSeek: R1 | Jan 20, 2025 | ~671B | 128K |
Text input
Text output
|
★★★ | ★★★★★ | $$$ |
DeepSeek: DeepSeek V3 | Dec 26, 2024 | — | 163K |
Text input
Text output
|
★★★ | ★★★★★ | $$$ |