Qwen: Qwen2.5 VL 32B Instruct

Text input Image input Text output Free Option
Author's Description

Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement learning for enhanced mathematical reasoning, structured outputs, and visual problem-solving capabilities. It excels at visual analysis tasks, including object recognition, textual interpretation within images, and precise event localization in extended videos. Qwen2.5-VL-32B demonstrates state-of-the-art performance across multimodal benchmarks such as MMMU, MathVista, and VideoMME, while maintaining strong reasoning and clarity in text-based tasks like MMLU, mathematical problem-solving, and code generation.

Key Specifications
Cost
$$$
Context
128K
Parameters
32B
Released
Mar 24, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Stop Presence Penalty Logit Bias Top P Temperature Structured Outputs Response Format Frequency Penalty Logprobs Max Tokens Top Logprobs
Features

This model supports the following features:

Structured Outputs Response Format
Performance Summary

Qwen2.5 VL 32B Instruct, a multimodal vision-language model, demonstrates a balanced performance profile with notable strengths in specific areas. It performs among the fastest models, typically ranking in the top tier for speed (61st percentile), and offers competitive pricing (48th percentile). The model exhibits exceptional reliability, boasting a 98% success rate, indicating consistent and usable responses. Across benchmarks, Qwen2.5 VL 32B shows strong performance in knowledge-based and ethical reasoning tasks, achieving perfect accuracy in Ethics and 98% in General Knowledge. Its visual analysis capabilities, as described, are likely contributing to its high performance in these areas, especially given its fine-tuning for enhanced mathematical reasoning and structured outputs. The model also performs well in Coding (84.0% accuracy), suggesting robust text-based reasoning. However, a significant weakness is observed in Instruction Following, where it achieved only 40.4% accuracy, indicating potential limitations in handling complex, multi-step directives. Reasoning performance is moderate at 62.0%. The model's efficiency is highlighted by its top-tier speed in Email Classification and its perfect accuracy in Ethics at a competitive price point and speed.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.9
Completion $0.9

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Fireworks
Fireworks | qwen/qwen2.5-vl-32b-instruct 128K $0.9 / 1M tokens $0.9 / 1M tokens
DeepInfra
DeepInfra | qwen/qwen2.5-vl-32b-instruct 128K $0.2 / 1M tokens $0.6 / 1M tokens
Chutes
Chutes | qwen/qwen2.5-vl-32b-instruct 16K $0.02 / 1M tokens $0.08 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by qwen