Author's Description
Qwen2.5 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements: - SoTA understanding of images of various resolution & ratio: Qwen2.5-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. - Understanding videos of 20min+: Qwen2.5-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. - Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2.5-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. - Multilingual Support: to serve global users, besides English and Chinese, Qwen2.5-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc. For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2-vl/) and [GitHub repo](https://github.com/QwenLM/Qwen2-VL). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
Key Specifications
Supported Parameters
This model supports the following parameters:
Performance Summary
Qwen2.5-VL 7B Instruct demonstrates a balanced performance profile, positioning itself as a reliable and cost-effective multimodal AI model. It performs among the faster models, ranking in the 66th percentile for speed, and offers competitive pricing, placing in the 71st percentile for cost-effectiveness. Notably, its reliability is exceptional, achieving the 98th percentile, indicating minimal technical failures and consistent response delivery. Across benchmarks, Qwen2.5-VL 7B shows particular strength in Ethics, achieving perfect 100% accuracy, making it the most accurate model at its price point and among models of similar speed. It also performs well in General Knowledge (91.8% accuracy) and Email Classification (92.0% accuracy), though its percentile rankings in these areas suggest room for improvement compared to top performers. Weaknesses are apparent in Coding (71.0% accuracy, 37th percentile) and Reasoning (48.0% accuracy, 35th percentile), where its performance is below average. Instruction Following is moderate at 51.5% accuracy. Its key strengths lie in its robust reliability, cost-efficiency, and specialized excellence in ethical reasoning, while its multimodal capabilities for image and video understanding, and agentic operations, as described in its details, are significant features not directly captured by these text-based benchmarks.
Model Pricing
Current Pricing
Feature | Price (per 1M tokens) |
---|---|
Prompt | $0.2 |
Completion | $0.2 |
Price History
Available Endpoints
Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
---|---|---|---|---|
Hyperbolic
|
Hyperbolic | qwen/qwen-2-vl-7b-instruct | 32K | $0.2 / 1M tokens | $0.2 / 1M tokens |
InferenceNet
|
InferenceNet | qwen/qwen-2-vl-7b-instruct | 128K | $0.2 / 1M tokens | $0.2 / 1M tokens |
Kluster
|
Kluster | qwen/qwen-2-vl-7b-instruct | 32K | $0.2 / 1M tokens | $0.2 / 1M tokens |
Benchmark Results
Benchmark | Category | Reasoning | Free | Executions | Accuracy | Cost | Duration |
---|
Other Models by qwen
|
Released | Params | Context |
|
Speed | Ability | Cost |
---|---|---|---|---|---|---|---|
Qwen: Qwen3 30B A3B Instruct 2507 | Jul 29, 2025 | 30B | 131K |
Text input
Text output
|
★★★★ | ★★★★ | $$$ |
Qwen: Qwen3 235B A22B Thinking 2507 | Jul 25, 2025 | 235B | 131K |
Text input
Text output
|
★ | ★★★★ | $$$$$ |
Qwen: Qwen3 Coder | Jul 22, 2025 | 480B | 1M |
Text input
Text output
|
★★★★ | ★★★ | $$$ |
Qwen: Qwen3 235B A22B Instruct 2507 | Jul 21, 2025 | 235B | 262K |
Text input
Text output
|
★ | ★★★ | $$$ |
Qwen: Qwen3 30B A3B | Apr 28, 2025 | 30B | 40K |
Text input
Text output
|
★ | ★★★★★ | $$$$ |
Qwen: Qwen3 8B | Apr 28, 2025 | 8B | 128K |
Text input
Text output
|
★ | ★★★ | $$$ |
Qwen: Qwen3 14B | Apr 28, 2025 | 14B | 40K |
Text input
Text output
|
★★ | ★★★★★ | $$$ |
Qwen: Qwen3 32B | Apr 28, 2025 | 32B | 40K |
Text input
Text output
|
★ | ★★★★★ | $$$ |
Qwen: Qwen3 235B A22B | Apr 28, 2025 | 235B | 40K |
Text input
Text output
|
★ | ★★★★ | $$$$ |
Qwen: Qwen2.5 VL 32B Instruct | Mar 24, 2025 | 32B | 128K |
Text input
Image input
Text output
|
★★ | ★★★ | $$$ |
Qwen: QwQ 32B | Mar 05, 2025 | 32B | 131K |
Text input
Text output
|
★ | ★★★ | $$$ |
Qwen: Qwen VL Plus | Feb 04, 2025 | — | 7K |
Text input
Image input
Text output
|
★★★★ | ★★ | $$$ |
Qwen: Qwen VL Max | Feb 01, 2025 | — | 7K |
Text input
Image input
Text output
|
★★★★★ | ★★★ | $$$$ |
Qwen: Qwen-Turbo | Feb 01, 2025 | — | 1M |
Text input
Text output
|
★★★★★ | ★★★★ | $$ |
Qwen: Qwen2.5 VL 72B Instruct | Feb 01, 2025 | 72B | 32K |
Text input
Image input
Text output
|
★★★★ | ★★★★ | $$ |
Qwen: Qwen-Plus | Feb 01, 2025 | — | 131K |
Text input
Text output
|
★★★ | ★★★★ | $$$ |
Qwen: Qwen-Max | Feb 01, 2025 | — | 32K |
Text input
Text output
|
★★★ | ★★★★ | $$$$ |
Qwen: QwQ 32B Preview | Nov 27, 2024 | 32B | 32K |
Text input
Text output
|
— | ★ | $$ |
Qwen2.5 Coder 32B Instruct | Nov 11, 2024 | ~500B | 32K |
Text input
Text output
|
★★★★★ | ★★★★★ | $ |
Qwen2.5 7B Instruct | Oct 15, 2024 | ~500B | 32K |
Text input
Text output
|
★ | ★★★ | $$ |
Qwen2.5 72B Instruct | Sep 18, 2024 | ~500B | 32K |
Text input
Text output
|
★★★ | ★★★ | $$$ |
Qwen 2 72B Instruct | Jun 06, 2024 | ~500B | 32K |
Text input
Text output
|
★★★★ | ★★ | $$$$ |