Qwen: Qwen3 235B A22B Thinking 2507

Text input Text output
Author's Description

Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B of its 235B parameters per forward pass and natively supports up to 262,144 tokens of context. This "thinking-only" variant enhances structured logical reasoning, mathematics, science, and long-form generation, showing strong benchmark performance across AIME, SuperGPQA, LiveCodeBench, and MMLU-Redux. It enforces a special reasoning mode (</think>) and is designed for high-token outputs (up to 81,920 tokens) in challenging domains. The model is instruction-tuned and excels at step-by-step reasoning, tool use, agentic workflows, and multilingual tasks. This release represents the most capable open-source variant in the Qwen3-235B series, surpassing many closed models in structured reasoning use cases.

Key Specifications
Cost
$$$$$
Context
131K
Parameters
235B
Released
Jul 25, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Include Reasoning Presence Penalty Tool Choice Top P Temperature Seed Tools Response Format Reasoning Max Tokens
Features

This model supports the following features:

Tools Reasoning Response Format
Performance Summary

Qwen3-235B-A22B-Thinking-2507 demonstrates a strong overall performance profile, particularly excelling in specialized reasoning and knowledge-based tasks. While its speed ranking places it among models with longer response times (14th percentile) and its price ranking indicates premium pricing levels (6th percentile), these are often justified by its exceptional capabilities. The model exhibits outstanding reliability, achieving a perfect 100th percentile, meaning it consistently provides usable responses with minimal technical failures. Across benchmarks, Qwen3-235B-A22B-Thinking-2507 shows remarkable accuracy in Coding (98.0%, 100th percentile) and General Knowledge (100.0%, perfect accuracy), often leading its price and speed categories in these domains. Its Reasoning performance is also strong at 90.0% accuracy (87th percentile). However, its Instruction Following accuracy is a notable weakness at 26.3% (25th percentile), suggesting challenges with complex multi-step directives despite its "thinking-only" design. Ethics performance is moderate at 98.0% (40th percentile), and Email Classification is solid at 99.0% (72nd percentile). The model's high cost and duration in the Instruction Following benchmark further highlight this area for improvement. Its core strength lies in structured logical reasoning and knowledge retrieval, making it highly suitable for demanding analytical applications.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.078
Completion $0.312

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Alibaba
Alibaba | qwen/qwen3-235b-a22b-thinking-2507 131K $0.7 / 1M tokens $8.4 / 1M tokens
Novita
Novita | qwen/qwen3-235b-a22b-thinking-2507 131K $0.078 / 1M tokens $0.312 / 1M tokens
Chutes
Chutes | qwen/qwen3-235b-a22b-thinking-2507 262K $0.078 / 1M tokens $0.312 / 1M tokens
Novita
Novita | qwen/qwen3-235b-a22b-thinking-2507 131K $0.3 / 1M tokens $3 / 1M tokens
DeepInfra
DeepInfra | qwen/qwen3-235b-a22b-thinking-2507 262K $0.13 / 1M tokens $0.6 / 1M tokens
Parasail
Parasail | qwen/qwen3-235b-a22b-thinking-2507 262K $0.65 / 1M tokens $3 / 1M tokens
Together
Together | qwen/qwen3-235b-a22b-thinking-2507 262K $0.65 / 1M tokens $3 / 1M tokens
Crusoe
Crusoe | qwen/qwen3-235b-a22b-thinking-2507 262K $0.078 / 1M tokens $0.312 / 1M tokens
Cerebras
Cerebras | qwen/qwen3-235b-a22b-thinking-2507 131K $0.6 / 1M tokens $1.2 / 1M tokens
GMICloud
GMICloud | qwen/qwen3-235b-a22b-thinking-2507 131K $0.6 / 1M tokens $3 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by qwen