Qwen: Qwen2.5 VL 32B Instruct

Text input Image input Text output Free Option
Author's Description

Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement learning for enhanced mathematical reasoning, structured outputs, and visual problem-solving capabilities. It excels at visual analysis tasks, including object recognition, textual interpretation within images, and precise event localization in extended videos. Qwen2.5-VL-32B demonstrates state-of-the-art performance across multimodal benchmarks such as MMMU, MathVista, and VideoMME, while maintaining strong reasoning and clarity in text-based tasks like MMLU, mathematical problem-solving, and code generation.

Key Specifications
Cost
$$$
Context
128K
Parameters
32B
Released
Mar 24, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Top Logprobs Logit Bias Structured Outputs Logprobs Response Format Stop Top P Max Tokens Frequency Penalty Temperature Presence Penalty
Features

This model supports the following features:

Response Format Structured Outputs
Performance Summary

Qwen2.5-VL-32B, created by qwen, is a multimodal vision-language model designed for enhanced mathematical reasoning, structured outputs, and visual problem-solving. With a context length of 128,000, it demonstrates competitive response times, ranking in the 59th percentile across benchmarks. The model also offers competitive pricing, placing in the 50th percentile. Notably, Qwen2.5-VL-32B exhibits exceptional reliability with a 95% success rate, indicating consistent and usable responses. In terms of performance across benchmarks, the model achieved perfect accuracy in Ethics, making it the most accurate model at its price point and among models of similar speed. It also performed strongly in General Knowledge (98.0% accuracy) and Email Classification (98.0% accuracy), with the latter offering a cost-effective solution. Its coding capabilities are solid, achieving 84.0% accuracy. However, the model shows notable weaknesses in Instruction Following (40.4% accuracy) and Reasoning (58.3% accuracy), with the latter also exhibiting a very long duration. Overall, Qwen2.5-VL-32B excels in visual analysis and ethical understanding, while its reasoning and instruction following capabilities could benefit from further refinement.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.9
Completion $0.9

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Fireworks
Fireworks | qwen/qwen2.5-vl-32b-instruct 128K $0.9 / 1M tokens $0.9 / 1M tokens
DeepInfra
DeepInfra | qwen/qwen2.5-vl-32b-instruct 128K $0.2 / 1M tokens $0.6 / 1M tokens
Chutes
Chutes | qwen/qwen2.5-vl-32b-instruct 16K $0.04 / 1M tokens $0.14 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by qwen