Qwen: Qwen3 VL 235B A22B Thinking

Text input Image input Text output
Author's Description

Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is optimized for multimodal reasoning in STEM and math. The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows, turning sketches or mockups into code and assisting with UI debugging, while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents.

Key Specifications
Cost
$$$$$
Context
131K
Parameters
235B
Released
Sep 23, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Tools Include Reasoning Temperature Tool Choice Seed Response Format Max Tokens Structured Outputs Presence Penalty Reasoning Top P
Features

This model supports the following features:

Structured Outputs Reasoning Tools Response Format
Performance Summary

Qwen3-VL-235B-A22B Thinking, a multimodal model excelling in text generation and visual understanding, demonstrates exceptional speed, consistently ranking among the fastest models. It also offers highly competitive pricing across various benchmarks. The model's performance across different categories presents a mixed profile. It shows strong capabilities in Email Classification (97.8% accuracy) and a reasonable ability to acknowledge uncertainty in Hallucinations (95.2% accuracy). However, its performance in core reasoning and knowledge-based tasks is notably weak, with 0.0% accuracy in Coding, General Knowledge, and Ethics, and very low accuracy in Reasoning (3.4%) and Mathematics (3.1%). This suggests a significant disparity between its multimodal perception and its ability to apply that understanding to complex problem-solving or factual recall. While optimized for multimodal reasoning in STEM and math, the benchmark results do not yet reflect this specialization in quantitative or logical tasks. Its strengths lie in classification and uncertainty handling, alongside its advanced multimodal features like agentic interaction, tool use, and visual coding workflows.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.7
Completion $8.4

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Alibaba
Alibaba | qwen/qwen3-vl-235b-a22b-thinking 131K $0.7 / 1M tokens $8.4 / 1M tokens
Novita
Novita | qwen/qwen3-vl-235b-a22b-thinking 131K $0.3 / 1M tokens $3 / 1M tokens
Parasail
Parasail | qwen/qwen3-vl-235b-a22b-thinking 262K $0.5 / 1M tokens $3.5 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by qwen