Qwen: Qwen3 VL 32B Instruct

Image input Text input Text output
Author's Description

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text comprehension, enabling fine-grained spatial reasoning, document and scene analysis, and long-horizon video understanding.Robust OCR in 32 languages, and enhanced multimodal fusion through Interleaved-MRoPE and DeepStack architectures. Optimized for agentic interaction and visual tool use, Qwen3-VL-32B delivers state-of-the-art performance for complex real-world multimodal tasks.

Key Specifications
Cost
$$
Context
262K
Parameters
32B
Released
Oct 23, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Min P Stop Structured Outputs Response Format Presence Penalty Logit Bias Seed Frequency Penalty Temperature Top P Max Tokens
Features

This model supports the following features:

Response Format Structured Outputs
Performance Summary

Qwen3-VL-32B-Instruct demonstrates competitive response times, performing in the 54th percentile across various benchmarks. It offers highly cost-effective solutions, ranking in the 80th percentile for price. Notably, the model exhibits exceptional reliability with a 100% success rate across all evaluated benchmarks, indicating minimal technical failures. The model excels in several critical areas, achieving perfect accuracy in General Knowledge, Ethics, Mathematics, Reasoning, and Coding benchmarks. Its performance in Mathematics is particularly outstanding, ranking #1 in accuracy and being the most accurate model at its price point and speed. Similarly, in Reasoning and Coding, it stands among the top 3 in accuracy with excellent cost-efficiency. While its Hallucinations accuracy is strong at 95.5%, there's a slight room for improvement. The model also shows strong instruction following capabilities, ranking in the top 3 for speed in this category. Its Email Classification accuracy is solid, though not a top performer. Overall, Qwen3-VL-32B-Instruct's key strengths lie in its high accuracy across complex cognitive tasks and its remarkable reliability, making it a robust and cost-efficient solution for demanding multimodal applications.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.35
Completion $1.1

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Parasail
Parasail | qwen/qwen3-vl-32b-instruct 262K $0.35 / 1M tokens $1.1 / 1M tokens
Parasail
Parasail | qwen/qwen3-vl-32b-instruct 262K $0.35 / 1M tokens $1.1 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by qwen