Author's Description
Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis. Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
Key Specifications
Supported Parameters
This model supports the following parameters:
Features
This model supports the following features:
Performance Summary
Meta's Llama 3.2 11B Vision Instruct model, created on September 24, 2024, demonstrates strong performance in several key areas. It consistently ranks among the fastest models and offers highly competitive pricing, placing in the 90th percentile across 7 benchmarks. The model also exhibits exceptional reliability with a 95% success rate, indicating minimal technical failures. In terms of specific benchmarks, Llama 3.2 11B Vision shows a notable strength in Ethics, achieving 98.0% accuracy. It also performs reasonably well in General Knowledge (89.3% accuracy) and Email Classification (94.0% accuracy). However, the model exhibits significant weaknesses in Instruction Following, with one benchmark showing 0.0% accuracy and another at 29.0%. Its performance in Mathematics (50.0% accuracy), Reasoning (25.0% accuracy), and Coding (64.0% accuracy) suggests these areas could benefit from further improvement. The model's ability to integrate visual understanding with language processing makes it suitable for multimodal applications, despite some areas needing development in complex reasoning and instruction adherence.
Model Pricing
Current Pricing
Feature | Price (per 1M tokens) |
---|---|
Prompt | $0.049 |
Completion | $0.68 |
Price History
Available Endpoints
Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
---|---|---|---|---|
DeepInfra
|
DeepInfra | meta-llama/llama-3.2-11b-vision-instruct | 131K | $0.049 / 1M tokens | $0.049 / 1M tokens |
Cloudflare
|
Cloudflare | meta-llama/llama-3.2-11b-vision-instruct | 128K | $0.049 / 1M tokens | $0.68 / 1M tokens |
Lambda
|
Lambda | meta-llama/llama-3.2-11b-vision-instruct | 131K | $0.049 / 1M tokens | $0.049 / 1M tokens |
InferenceNet
|
InferenceNet | meta-llama/llama-3.2-11b-vision-instruct | 16K | $0.055 / 1M tokens | $0.055 / 1M tokens |
Together
|
Together | meta-llama/llama-3.2-11b-vision-instruct | 131K | $0.18 / 1M tokens | $0.18 / 1M tokens |
Together
|
Together | meta-llama/llama-3.2-11b-vision-instruct | 131K | $0.049 / 1M tokens | $0.049 / 1M tokens |
DeepInfra
|
DeepInfra | meta-llama/llama-3.2-11b-vision-instruct | 131K | $0.049 / 1M tokens | $0.049 / 1M tokens |
Benchmark Results
Benchmark | Category | Reasoning | Strategy | Free | Executions | Accuracy | Cost | Duration |
---|
Other Models by meta-llama
|
Released | Params | Context |
|
Speed | Ability | Cost |
---|---|---|---|---|---|---|---|
Meta: Llama Guard 4 12B | Apr 29, 2025 | 12B | 163K |
Text input
Image input
Text output
|
— | ★ | $$ |
Meta: Llama 4 Maverick | Apr 05, 2025 | 17B | 1M |
Text input
Image input
Text output
|
★★★★★ | ★★★ | $$$ |
Meta: Llama 4 Scout | Apr 05, 2025 | 17B | 327K |
Text input
Image input
Text output
|
★★★★ | ★★ | $$ |
Llama Guard 3 8B | Feb 12, 2025 | 8B | 131K |
Text input
Text output
|
★★ | ★ | $$ |
Meta: Llama 3.3 70B Instruct | Dec 06, 2024 | 70B | 131K |
Text input
Text output
|
★★★★ | ★★★★ | $ |
Meta: Llama 3.2 1B Instruct | Sep 24, 2024 | 1B | 131K |
Text input
Text output
|
★★ | ★ | $ |
Meta: Llama 3.2 3B Instruct | Sep 24, 2024 | 3B | 131K |
Text input
Text output
|
★★★ | ★ | $ |
Meta: Llama 3.2 90B Vision Instruct | Sep 24, 2024 | 90B | 131K |
Text input
Image input
Text output
|
★★★ | ★★ | $$$$ |
Meta: Llama 3.1 405B (base) | Aug 01, 2024 | 405B | 32K |
Text input
Text output
|
★ | ★ | $$$ |
Meta: Llama 3.1 70B Instruct | Jul 22, 2024 | 70B | 131K |
Text input
Text output
|
★★★★ | ★★ | $$ |
Meta: Llama 3.1 405B Instruct | Jul 22, 2024 | 405B | 32K |
Text input
Text output
|
★★★★ | ★★ | $$$ |
Meta: Llama 3.1 8B Instruct | Jul 22, 2024 | 8B | 131K |
Text input
Text output
|
★★★ | ★★ | $ |
Meta: LlamaGuard 2 8B | May 12, 2024 | 8B | 8K |
Text input
Text output
|
★★★★ | ★ | $$ |
Meta: Llama 3 8B Instruct | Apr 17, 2024 | 8B | 8K |
Text input
Text output
|
★★★ | ★★ | $ |
Meta: Llama 3 70B Instruct | Apr 17, 2024 | 70B | 8K |
Text input
Text output
|
★★★★ | ★★ | $$$ |
Meta: Llama 2 70B Chat Unavailable | Jun 19, 2023 | 70B | 4K |
Text input
Text output
|
— | — | $$$$ |