Qwen: Qwen3 VL 235B A22B Thinking

Text input Image input Text output
Author's Description

Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is optimized for multimodal reasoning in STEM and math. The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows, turning sketches or mockups into code and assisting with UI debugging, while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents.

Key Specifications
Cost
$$$$$
Context
131K
Parameters
235B
Released
Sep 23, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Top P Reasoning Seed Tool Choice Structured Outputs Temperature Response Format Max Tokens Tools Presence Penalty Include Reasoning
Features

This model supports the following features:

Structured Outputs Response Format Reasoning Tools
Performance Summary

Qwen3-VL-235B-A22B Thinking consistently ranks among the fastest models and offers highly competitive pricing across various benchmarks. This multimodal model, designed for strong text generation and visual understanding, demonstrates particular strength in Instruction Following, achieving 81.6% accuracy (93rd percentile), indicating a robust ability to process and execute complex directives. Its Email Classification performance is also strong at 97.8% accuracy (51st percentile), showcasing effective contextual understanding. The model exhibits a reasonable ability to acknowledge uncertainty, with 95.2% accuracy in the Hallucinations benchmark (51st percentile). However, significant weaknesses are apparent in core cognitive areas. Performance in Coding, General Knowledge, and Ethics benchmarks is 0.0% accuracy, suggesting a complete inability to address these types of questions. Reasoning and Mathematics also show very low accuracy at 3.4% (5th percentile) and 3.1% (9th percentile) respectively, indicating substantial limitations in complex problem-solving and quantitative analysis. While optimized for multimodal reasoning in STEM and math, the benchmark results do not reflect this specialization in the tested categories. The model's strengths lie in its operational efficiency and specific classification/instruction adherence rather than broad knowledge or advanced reasoning.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.3
Completion $1.2

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Alibaba
Alibaba | qwen/qwen3-vl-235b-a22b-thinking 131K $0.3 / 1M tokens $1.2 / 1M tokens
Novita
Novita | qwen/qwen3-vl-235b-a22b-thinking 131K $0.3 / 1M tokens $1.2 / 1M tokens
Parasail
Parasail | qwen/qwen3-vl-235b-a22b-thinking 65K $0.3 / 1M tokens $1.2 / 1M tokens
Parasail
Parasail | qwen/qwen3-vl-235b-a22b-thinking 262K $0.3 / 1M tokens $1.2 / 1M tokens
SiliconFlow
SiliconFlow | qwen/qwen3-vl-235b-a22b-thinking 262K $0.45 / 1M tokens $3.5 / 1M tokens
Chutes
Chutes | qwen/qwen3-vl-235b-a22b-thinking 262K $0.3 / 1M tokens $1.2 / 1M tokens
Novita
Novita | qwen/qwen3-vl-235b-a22b-thinking 131K $0.784 / 1M tokens $3.16 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by qwen