Meta: Llama 3.2 11B Vision Instruct

Text input Image input Text output
Author's Description

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...

Key Specifications
Cost
$$
Context
128K
Parameters
11B
Released
Sep 24, 2024
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Seed Frequency Penalty Top P Min P Response Format Temperature Stop Presence Penalty Max Tokens
Features

This model supports the following features:

Response Format
Performance Summary

Meta's Llama 3.2 11B Vision Instruct model, created on September 24, 2024, demonstrates exceptional speed, consistently ranking among the fastest models across all 8 benchmarks. It also offers highly competitive pricing, placing in the 90th percentile across 7 benchmarks. With a 95% success rate across 8 benchmarks, its reliability is notably high, indicating minimal technical failures and consistent evaluable responses. As a multimodal model designed for visual and textual data, Llama 3.2 11B Vision excels in bridging language generation and visual reasoning. Its key strengths lie in its cost-effectiveness and speed, making it an efficient choice for various applications. While it shows strong performance in Ethics (98.0% accuracy) and Email Classification (94.0% accuracy), its performance in other knowledge-based and reasoning tasks is more modest. General Knowledge (89.3% accuracy) and Coding (64.0% accuracy) are average, while Mathematics (50.0% accuracy) and Reasoning (25.0% accuracy) present notable weaknesses. The model also struggles significantly with complex Instruction Following, with one benchmark showing 0.0% accuracy. This suggests that while it can process and classify information well, its ability to execute multi-layered, conditional instructions needs improvement.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.049
Completion $0.049

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
DeepInfra
DeepInfra | meta-llama/llama-3.2-11b-vision-instruct 131K $0.049 / 1M tokens $0.049 / 1M tokens
Cloudflare
Cloudflare | meta-llama/llama-3.2-11b-vision-instruct 128K $0.049 / 1M tokens $0.049 / 1M tokens
Lambda
Lambda | meta-llama/llama-3.2-11b-vision-instruct 131K $0.049 / 1M tokens $0.049 / 1M tokens
InferenceNet
InferenceNet | meta-llama/llama-3.2-11b-vision-instruct 16K $0.049 / 1M tokens $0.049 / 1M tokens
Together
Together | meta-llama/llama-3.2-11b-vision-instruct 131K $0.049 / 1M tokens $0.049 / 1M tokens
Together
Together | meta-llama/llama-3.2-11b-vision-instruct 131K $0.049 / 1M tokens $0.049 / 1M tokens
DeepInfra
DeepInfra | meta-llama/llama-3.2-11b-vision-instruct 131K $0.049 / 1M tokens $0.049 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by meta-llama