Author's Description
The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioning, visual question answering, and advanced image-text comprehension. Pre-trained on vast multimodal datasets and fine-tuned with human feedback, the Llama 90B Vision is engineered to handle the most demanding image-based AI tasks. This model is perfect for industries requiring cutting-edge multimodal AI capabilities, particularly those dealing with complex, real-time visual and textual analysis. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
Key Specifications
Supported Parameters
This model supports the following parameters:
Performance Summary
Meta: Llama 3.2 90B Vision Instruct demonstrates a strong overall performance profile, particularly excelling in reliability and offering competitive speed. The model consistently provides usable responses, ranking in the 84th percentile for reliability, indicating very few technical issues. In terms of speed, it performs among the faster models, typically in the top tier (69th percentile). Its pricing is moderate, positioned in the 38th percentile. Analyzing benchmark results, the model shows exceptional accuracy in classification tasks, achieving 99.0% in Email Classification, and strong performance in General Knowledge (97.5%) and Ethics (99.0%). These results highlight its robust understanding of structured information and ethical principles. However, a significant weakness is observed in Coding, where it achieved only 6.0% accuracy, placing it in the 14th percentile. Performance in Instruction Following (51.0%) and Reasoning (50.0%) is moderate, suggesting room for improvement in complex multi-step tasks. Overall, Llama 3.2 90B Vision Instruct is a highly reliable model well-suited for tasks requiring high accuracy in classification and knowledge recall, though its capabilities in coding and complex reasoning may require further development.
Model Pricing
Current Pricing
Feature | Price (per 1M tokens) |
---|---|
Prompt | $1.2 |
Completion | $1.2 |
Price History
Available Endpoints
Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
---|---|---|---|---|
Together
|
Together | meta-llama/llama-3.2-90b-vision-instruct | 131K | $1.2 / 1M tokens | $1.2 / 1M tokens |
DeepInfra
|
DeepInfra | meta-llama/llama-3.2-90b-vision-instruct | 32K | $0.35 / 1M tokens | $0.4 / 1M tokens |
Benchmark Results
Benchmark | Category | Reasoning | Free | Executions | Accuracy | Cost | Duration |
---|
Other Models by meta-llama
|
Released | Params | Context |
|
Speed | Ability | Cost |
---|---|---|---|---|---|---|---|
Meta: Llama Guard 4 12B | Apr 29, 2025 | 12B | 163K |
Text input
Image input
Text output
|
★★★★ | ★ | $ |
Meta: Llama 4 Maverick | Apr 05, 2025 | 17B | 1M |
Text input
Image input
Text output
|
★★★★ | ★★★ | $$$ |
Meta: Llama 4 Scout | Apr 05, 2025 | 17B | 1M |
Text input
Image input
Text output
|
★★★★ | ★★★ | $$ |
Llama Guard 3 8B | Feb 12, 2025 | 8B | 131K |
Text input
Text output
|
★ | ★ | $ |
Meta: Llama 3.3 70B Instruct | Dec 06, 2024 | 70B | 131K |
Text input
Text output
|
★★★★ | ★★★★★ | $ |
Meta: Llama 3.2 1B Instruct | Sep 24, 2024 | 1B | 131K |
Text input
Text output
|
★★ | ★ | $ |
Meta: Llama 3.2 3B Instruct | Sep 24, 2024 | 3B | 131K |
Text input
Text output
|
★★★ | ★ | $ |
Meta: Llama 3.2 11B Vision Instruct | Sep 24, 2024 | 11B | 128K |
Text input
Image input
Text output
|
★★★ | ★★ | $$ |
Meta: Llama 3.1 405B (base) | Aug 01, 2024 | 405B | 32K |
Text input
Text output
|
★ | ★ | $$$$ |
Meta: Llama 3.1 70B Instruct | Jul 22, 2024 | 70B | 131K |
Text input
Text output
|
★★★★ | ★★ | $$ |
Meta: Llama 3.1 405B Instruct | Jul 22, 2024 | 405B | 32K |
Text input
Text output
|
★★★★ | ★★ | $$$$ |
Meta: Llama 3.1 8B Instruct | Jul 22, 2024 | 8B | 131K |
Text input
Text output
|
★★★ | ★★★ | $ |
Meta: LlamaGuard 2 8B | May 12, 2024 | 8B | 8K |
Text input
Text output
|
★★★★ | ★ | $$ |
Meta: Llama 3 8B Instruct | Apr 17, 2024 | 8B | 8K |
Text input
Text output
|
★★★ | ★★ | $ |
Meta: Llama 3 70B Instruct | Apr 17, 2024 | 70B | 8K |
Text input
Text output
|
★★★★ | ★★ | $$$ |
Meta: Llama 2 70B Chat Unavailable | Jun 19, 2023 | 70B | 4K |
Text input
Text output
|
— | — | $$$$ |