Mistral: Devstral Small 1.1

Text input Text output
Author's Description

Devstral Small 1.1 is a 24B parameter open-weight language model for software engineering agents, developed by Mistral AI in collaboration with All Hands AI. Finetuned from Mistral Small 3.1 and released under the Apache 2.0 license, it features a 128k token context window and supports both Mistral-style function calling and XML output formats. Designed for agentic coding workflows, Devstral Small 1.1 is optimized for tasks such as codebase exploration, multi-file edits, and integration into autonomous development agents like OpenHands and Cline. It achieves 53.6% on SWE-Bench Verified, surpassing all other open models on this benchmark, while remaining lightweight enough to run on a single 4090 GPU or Apple silicon machine. The model uses a Tekken tokenizer with a 131k vocabulary and is deployable via vLLM, Transformers, Ollama, LM Studio, and other OpenAI-compatible runtimes.

Key Specifications
Cost
$$
Context
131K
Parameters
24B (Rumoured)
Released
Jul 10, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Stop Presence Penalty Tool Choice Top P Temperature Seed Tools Structured Outputs Response Format Frequency Penalty Max Tokens
Features

This model supports the following features:

Tools Structured Outputs Response Format
Performance Summary

Mistral: Devstral Small 1.1 demonstrates a strong overall performance profile, particularly excelling in reliability and cost-efficiency. It consistently offers among the most competitive pricing, ranking in the 81st percentile across benchmarks, and exhibits exceptional reliability with a perfect 100th percentile ranking, indicating virtually no technical failures. While generally performing in the top tier for speed (69th percentile), it notably achieved a top 3 speed ranking in the Instruction Following (Baseline) benchmark. In terms of specific capabilities, Devstral Small 1.1 shows robust performance in Classification (98.0% accuracy in Email Classification) and General Knowledge (97.5% accuracy). Its core strength lies in Coding, achieving 85.0% accuracy in the Coding (Baseline) benchmark, aligning with its design for software engineering agents. A notable area for improvement is Instruction Following, where it scored 51.0% accuracy, and Reasoning, with 58.0% accuracy, placing it around the 50th percentile for these categories. Despite these areas, its high reliability and cost-effectiveness make it a compelling choice for its intended agentic coding workflows.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.1
Completion $0.3

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Mistral
Mistral | mistralai/devstral-small-2507 131K $0.1 / 1M tokens $0.3 / 1M tokens
Parasail
Parasail | mistralai/devstral-small-2507 131K $0.07 / 1M tokens $0.28 / 1M tokens
NextBit
NextBit | mistralai/devstral-small-2507 131K $0.07 / 1M tokens $0.28 / 1M tokens
DeepInfra
DeepInfra | mistralai/devstral-small-2507 128K $0.07 / 1M tokens $0.28 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by mistralai