DeepSeek: DeepSeek V3.1 Base

Text input Text output Unavailable
Author's Description

This is a base model, trained only for raw next-token prediction. Unlike instruct/chat models, it has not been fine-tuned to follow user instructions. Prompts need to be written more like training text or examples rather than simple requests (e.g., “Translate the following sentence…” instead of just “Translate this”). DeepSeek-V3.1 Base is a 671B parameter open Mixture-of-Experts (MoE) language model with 37B active parameters per forward pass and a context length of 128K tokens. Trained on 14.8T tokens using FP8 mixed precision, it achieves high training efficiency and stability, with strong performance across language, reasoning, math, and coding tasks.

Key Specifications
Cost
$$
Context
163K
Parameters
671B (Rumoured)
Released
Aug 20, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Stop Max Tokens Temperature Min P Top P Logprobs Top Logprobs Frequency Penalty Presence Penalty Seed Logit Bias
Performance Summary

DeepSeek-V3.1 Base, a 671B parameter open MoE model with 37B active parameters, demonstrates exceptional speed and cost efficiency. It consistently ranks among the fastest models and offers among the most competitive pricing across all evaluated benchmarks. As a base model, it is designed for raw next-token prediction rather than instruction following. This is evident in its performance, particularly its 0.0% accuracy in Instruction Following. The model shows notable strengths in Coding, achieving 58.3% accuracy, and exhibits strong cost efficiency across most benchmarks, particularly in General Knowledge, Ethics, and Coding. However, its accuracy is generally low across knowledge-based and reasoning tasks. It scored 36.4% in General Knowledge, 50.0% in Ethics, and a mere 2.2% in Reasoning. Mathematics and Email Classification also presented significant challenges, with accuracies of 11.7% and 33.3% respectively. Its performance profile suggests it is best suited for tasks where raw text generation and pattern recognition are paramount, rather than complex instruction adherence or deep reasoning.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.25
Completion $1

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Chutes
Chutes | deepseek/deepseek-v3.1-base 163K $0.25 / 1M tokens $1 / 1M tokens
Chutes
Chutes | deepseek/deepseek-v3.1-base 163K $0.25 / 1M tokens $1 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by deepseek