Meta: Llama 3.2 90B Vision Instruct

Text input Image input Text output
Author's Description

The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioning, visual question answering, and advanced image-text comprehension. Pre-trained on vast multimodal datasets and fine-tuned with human feedback, the Llama 90B Vision is engineered to handle the most demanding image-based AI tasks. This model is perfect for industries requiring cutting-edge multimodal AI capabilities, particularly those dealing with complex, real-time visual and textual analysis. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

Key Specifications
Cost
$$$$
Context
131K
Parameters
90B
Released
Sep 24, 2024
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Stop Presence Penalty Logit Bias Top P Temperature Min P Frequency Penalty Max Tokens
Performance Summary

Meta: Llama 3.2 90B Vision Instruct demonstrates a strong overall performance profile, particularly excelling in reliability and offering competitive speed. The model consistently provides usable responses, ranking in the 84th percentile for reliability, indicating very few technical issues. In terms of speed, it performs among the faster models, typically in the top tier (69th percentile). Its pricing is moderate, positioned in the 38th percentile. Analyzing benchmark results, the model shows exceptional accuracy in classification tasks, achieving 99.0% in Email Classification, and strong performance in General Knowledge (97.5%) and Ethics (99.0%). These results highlight its robust understanding of structured information and ethical principles. However, a significant weakness is observed in Coding, where it achieved only 6.0% accuracy, placing it in the 14th percentile. Performance in Instruction Following (51.0%) and Reasoning (50.0%) is moderate, suggesting room for improvement in complex multi-step tasks. Overall, Llama 3.2 90B Vision Instruct is a highly reliable model well-suited for tasks requiring high accuracy in classification and knowledge recall, though its capabilities in coding and complex reasoning may require further development.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $1.2
Completion $1.2

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Together
Together | meta-llama/llama-3.2-90b-vision-instruct 131K $1.2 / 1M tokens $1.2 / 1M tokens
DeepInfra
DeepInfra | meta-llama/llama-3.2-90b-vision-instruct 32K $0.35 / 1M tokens $0.4 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by meta-llama