Mistral: Devstral 2 2512

Text input Text output
Author's Description

Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding. It is a 123B-parameter dense transformer model supporting a 256K context window. Devstral 2 supports exploring codebases and orchestrating changes across multiple files while maintaining architecture-level context. It tracks framework dependencies, detects failures, and retries with corrections—solving challenges like bug fixing and modernizing legacy systems. The model can be fine-tuned to prioritize specific languages or optimize for large enterprise codebases. It is available under a modified MIT license.

Key Specifications
Cost
$$
Context
262K
Parameters
123B (Rumoured)
Released
Dec 09, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Presence Penalty Tools Response Format Structured Outputs Temperature Top P Frequency Penalty Tool Choice Max Tokens Seed Stop
Features

This model supports the following features:

Structured Outputs Tools Response Format
Performance Summary

Mistral: Devstral 2 2512 demonstrates moderate speed performance, ranking in the 40th percentile across benchmarks. It offers cost-effective solutions, placing in the 71st percentile for price. A standout feature is its exceptional reliability, achieving a 99% success rate, indicating consistent and usable responses. In terms of specific performance, Devstral 2 excels in acknowledging uncertainty, achieving perfect accuracy in Hallucinations (Baseline) tests, making it the most accurate model at its price point and speed. It shows strong general knowledge with 98.5% accuracy and solid performance in coding (88.0% accuracy) and instruction following (62.0% accuracy). However, its performance in Ethics (Baseline) is comparatively lower at 96.0% accuracy, placing it in the 28th percentile, and it exhibits a significantly longer duration for this benchmark. Overall, Devstral 2's key strengths lie in its specialized agentic coding capabilities, high reliability, and cost-effectiveness, particularly for tasks requiring precise uncertainty acknowledgment. Its primary area for improvement appears to be in the efficiency of ethical reasoning tasks.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.4
Completion $2

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Chutes
Chutes | mistralai/devstral-2512 262K $0.4 / 1M tokens $2 / 1M tokens
Mistral
Mistral | mistralai/devstral-2512 262K $0.4 / 1M tokens $2 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by mistralai