Meta: Llama 3.2 11B Vision Instruct

Name: Meta: Llama 3.2 11B Vision Instruct
Brand: meta-llama
Price: 4.9e-8 USD
Availability: InStock
Rating: 1.8 (7 reviews)

Back

Image input Text input Text output

Author's Description

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis. Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

Key Specifications

Cost

Context

128K

Parameters

11B

Released

Sep 24, 2024

Speed

★★

Ability

★★

Reliability

★★

Hugging Face

Supported Parameters

This model supports the following parameters:

Stop Presence Penalty Max Tokens Frequency Penalty Top P Seed Temperature Response Format Min P

Features

This model supports the following features:

Response Format

Performance Summary

Meta's Llama 3.2 11B Vision Instruct model, created on September 24, 2024, demonstrates strong performance in several key areas. It consistently ranks among the fastest models and offers highly competitive pricing, placing in the 90th percentile across 7 benchmarks. The model also exhibits exceptional reliability with a 95% success rate, indicating minimal technical failures. In terms of specific benchmarks, Llama 3.2 11B Vision shows a notable strength in Ethics, achieving 98.0% accuracy. It also performs reasonably well in General Knowledge (89.3% accuracy) and Email Classification (94.0% accuracy). However, the model exhibits significant weaknesses in Instruction Following, with one benchmark showing 0.0% accuracy and another at 29.0%. Its performance in Mathematics (50.0% accuracy), Reasoning (25.0% accuracy), and Coding (64.0% accuracy) suggests these areas could benefit from further improvement. The model's ability to integrate visual understanding with language processing makes it suitable for multimodal applications, despite some areas needing development in complex reasoning and instruction adherence.

Model Pricing

Current Pricing

Feature	Price (per 1M tokens)
Prompt	$0.049
Completion	$0.68

Price History

Available Endpoints

Provider	Endpoint Name	Context Length	Pricing (Input)	Pricing (Output)
DeepInfra	DeepInfra \| meta-llama/llama-3.2-11b-vision-instruct	131K	$0.049 / 1M tokens	$0.049 / 1M tokens
Cloudflare	Cloudflare \| meta-llama/llama-3.2-11b-vision-instruct	128K	$0.049 / 1M tokens	$0.68 / 1M tokens
Lambda	Lambda \| meta-llama/llama-3.2-11b-vision-instruct	131K	$0.049 / 1M tokens	$0.049 / 1M tokens
InferenceNet	InferenceNet \| meta-llama/llama-3.2-11b-vision-instruct	16K	$0.049 / 1M tokens	$0.049 / 1M tokens
Together	Together \| meta-llama/llama-3.2-11b-vision-instruct	131K	$0.18 / 1M tokens	$0.18 / 1M tokens
Together	Together \| meta-llama/llama-3.2-11b-vision-instruct	131K	$0.049 / 1M tokens	$0.049 / 1M tokens
DeepInfra	DeepInfra \| meta-llama/llama-3.2-11b-vision-instruct	131K	$0.049 / 1M tokens	$0.049 / 1M tokens

Benchmark Results

Benchmark	Category	Reasoning	Strategy	Free	Executions	Accuracy	Cost	Duration

Other Models by meta-llama

	Released	Params	Context	Filter by Modalities All Modalities	Speed	Ability	Cost
Meta: Llama Guard 4 12B	Apr 29, 2025	12B	163K	Image input Text input Text output	—	★	$$
Meta: Llama 4 Maverick	Apr 05, 2025	17B	1M	Image input Text input Text output	★★★★★	★★★	$$$
Meta: Llama 4 Scout	Apr 05, 2025	17B	327K	Image input Text input Text output	★★★★	★★	$$
Llama Guard 3 8B	Feb 12, 2025	8B	131K	Text input Text output	★★	★	$$
Meta: Llama 3.3 70B Instruct	Dec 06, 2024	70B	131K	Text input Text output	★★★★	★★★★	$
Meta: Llama 3.2 1B Instruct	Sep 24, 2024	1B	131K	Text input Text output	★★	★	$
Meta: Llama 3.2 3B Instruct	Sep 24, 2024	3B	131K	Text input Text output	★★★	★	$
Meta: Llama 3.2 90B Vision Instruct	Sep 24, 2024	90B	131K	Image input Text input Text output	★★★	★★	$$$$
Meta: Llama 3.1 405B (base)	Aug 01, 2024	405B	32K	Text input Text output	★	★	$$$
Meta: Llama 3.1 70B Instruct	Jul 22, 2024	70B	131K	Text input Text output	★★★★	★★	$$
Meta: Llama 3.1 405B Instruct	Jul 22, 2024	405B	32K	Text input Text output	★★★★	★★	$$$
Meta: Llama 3.1 8B Instruct	Jul 22, 2024	8B	131K	Text input Text output	★★★	★★	$
Meta: LlamaGuard 2 8B	May 12, 2024	8B	8K	Text input Text output	★★★★	★	$$
Meta: Llama 3 8B Instruct	Apr 17, 2024	8B	8K	Text input Text output	★★★	★★	$
Meta: Llama 3 70B Instruct	Apr 17, 2024	70B	8K	Text input Text output	★★★★	★★	$$$
Meta: Llama 2 70B Chat Unavailable	Jun 19, 2023	70B	4K	Text input Text output	—	—	$$$$