Qwen: Qwen3 235B A22B

Text input Text output
Author's Description

Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating 22B parameters per forward pass. It supports seamless switching between a "thinking" mode for complex reasoning, math, and code tasks, and a "non-thinking" mode for general conversational efficiency. The model demonstrates strong reasoning ability, multilingual support (100+ languages and dialects), advanced instruction-following, and agent tool-calling capabilities. It natively handles a 32K token context window and extends up to 131K tokens using YaRN-based scaling.

Key Specifications
Cost
$$$$
Context
40K
Parameters
235B
Released
Apr 28, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Tool Choice Reasoning Include Reasoning Response Format Seed Top P Temperature Tools Stop Min P Max Tokens Frequency Penalty Presence Penalty
Features

This model supports the following features:

Tools Reasoning Response Format
Performance Summary

Qwen3-235B-A22B, a 235B parameter MoE model, exhibits a strong performance profile with notable strengths in accuracy and reliability, despite slower response times. It ranks in the 8th percentile for speed, indicating longer processing durations across benchmarks. Conversely, its pricing is moderate, falling within the 34th percentile. A standout feature is its exceptional reliability, achieving a 100% success rate across all evaluated benchmarks, signifying minimal technical failures. The model demonstrates outstanding accuracy in Coding (97.0%), General Knowledge (100.0%), Email Classification (100.0%), and Reasoning (98.0%). It achieved perfect scores in General Knowledge and Email Classification, often being the most accurate model at its price point and speed. While its Instruction Following accuracy (40.4%) is a weakness, it excels in complex tasks like Coding and Reasoning, aligning with its "thinking" mode capabilities. Its multilingual support and advanced instruction-following are key features, though the latter's benchmark performance suggests room for improvement in highly complex, multi-layered instructions.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.18
Completion $0.54

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
DeepInfra
DeepInfra | qwen/qwen3-235b-a22b-04-28 40K $0.18 / 1M tokens $0.54 / 1M tokens
Parasail
Parasail | qwen/qwen3-235b-a22b-04-28 40K $0.18 / 1M tokens $0.54 / 1M tokens
Kluster
Kluster | qwen/qwen3-235b-a22b-04-28 40K $0.18 / 1M tokens $0.54 / 1M tokens
GMICloud
GMICloud | qwen/qwen3-235b-a22b-04-28 32K $0.18 / 1M tokens $0.54 / 1M tokens
Together
Together | qwen/qwen3-235b-a22b-04-28 40K $0.2 / 1M tokens $0.6 / 1M tokens
Nebius
Nebius | qwen/qwen3-235b-a22b-04-28 40K $0.18 / 1M tokens $0.54 / 1M tokens
Novita
Novita | qwen/qwen3-235b-a22b-04-28 40K $0.18 / 1M tokens $0.54 / 1M tokens
Fireworks
Fireworks | qwen/qwen3-235b-a22b-04-28 131K $0.22 / 1M tokens $0.88 / 1M tokens
Friendli
Friendli | qwen/qwen3-235b-a22b-04-28 131K $0.18 / 1M tokens $0.54 / 1M tokens
Cerebras
Cerebras | qwen/qwen3-235b-a22b-04-28 40K $0.18 / 1M tokens $0.54 / 1M tokens
Chutes
Chutes | qwen/qwen3-235b-a22b-04-28 40K $0.18 / 1M tokens $0.54 / 1M tokens
Chutes
Chutes | qwen/qwen3-235b-a22b-04-28 40K $0.25 / 1M tokens $1 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by qwen