Qwen: Qwen3 VL 235B A22B Thinking

Text input Image input Text output
Author's Description

Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is optimized for multimodal reasoning in STEM and math. The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows, turning sketches or mockups into code and assisting with UI debugging, while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents.

Key Specifications
Cost
$$$$$
Context
131K
Parameters
235B
Released
Sep 23, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Temperature Response Format Include Reasoning Seed Tool Choice Reasoning Max Tokens Top P Presence Penalty Tools
Features

This model supports the following features:

Tools Response Format Reasoning
Performance Summary

Qwen3 VL 235B A22B Thinking consistently ranks among the fastest models and offers highly competitive pricing across various benchmarks. This multimodal model, designed for strong text generation and visual understanding, demonstrates notable strengths in specific areas. It achieves a high accuracy of 95.2% in Hallucinations (Baseline) testing, indicating a strong ability to acknowledge uncertainty, and an impressive 81.6% in Instruction Following (Baseline), showcasing its precision in adhering to complex directives. The model also performs well in Email Classification (Baseline) with 97.8% accuracy. However, the model exhibits significant weaknesses in core cognitive benchmarks. It scores 0.0% accuracy in Coding (Baseline), General Knowledge (Baseline), and Ethics (Baseline), suggesting a lack of foundational understanding or an inability to perform well in these specific multiple-choice formats. Its performance in Reasoning (Baseline) and Mathematics (Baseline) is also very low, at 3.4% and 3.1% accuracy respectively, despite its description emphasizing optimization for multimodal reasoning in STEM and math. While its speed and cost efficiency are exceptional, the model's current performance across several critical reasoning and knowledge-based tasks indicates areas requiring substantial improvement.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.26
Completion $2.6

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Alibaba
Alibaba | qwen/qwen3-vl-235b-a22b-thinking 131K $0.26 / 1M tokens $2.6 / 1M tokens
Novita
Novita | qwen/qwen3-vl-235b-a22b-thinking 131K $0.26 / 1M tokens $2.6 / 1M tokens
Parasail
Parasail | qwen/qwen3-vl-235b-a22b-thinking 65K $0.26 / 1M tokens $2.6 / 1M tokens
Parasail
Parasail | qwen/qwen3-vl-235b-a22b-thinking 262K $0.26 / 1M tokens $2.6 / 1M tokens
SiliconFlow
SiliconFlow | qwen/qwen3-vl-235b-a22b-thinking 262K $0.45 / 1M tokens $3.5 / 1M tokens
Chutes
Chutes | qwen/qwen3-vl-235b-a22b-thinking 262K $0.26 / 1M tokens $2.6 / 1M tokens
Novita
Novita | qwen/qwen3-vl-235b-a22b-thinking 131K $0.98 / 1M tokens $3.95 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by qwen