AllenAI: Molmo 7B D

Name: AllenAI: Molmo 7B D
Brand: allenai
Price: 1e-7 USD
Availability: InStock
Rating: 3.0 (6 reviews)

Back

Text input Image input Text output

Author's Description

Molmo is a family of open vision-language models developed by the Allen Institute for AI. Molmo models are trained on PixMo, a dataset of 1 million, highly-curated image-text pairs. It has state-of-the-art performance among multimodal models with a similar size while being fully open-source. You can find all models in the Molmo family [here](https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19). Learn more about the Molmo family [in the announcement blog post](https://molmo.allenai.org/blog) or the [paper](https://huggingface.co/papers/2409.17146). Molmo 7B-D is based on [Qwen2-7B](https://huggingface.co/Qwen/Qwen2-7B) and uses [OpenAI CLIP](https://huggingface.co/openai/clip-vit-large-patch14-336) as vision backbone. It performs comfortably between GPT-4V and GPT-4o on both academic benchmarks and human evaluation. This checkpoint is a preview of the Molmo release. All artifacts used in creating Molmo (PixMo dataset, training code, evaluations, intermediate checkpoints) will be made available at a later date, furthering our commitment to open-source AI development and reproducibility.

Key Specifications

Context

Parameters

Released

Mar 26, 2025

Hugging Face

Supported Parameters

This model supports the following parameters:

Temperature Frequency Penalty Stop Seed Logit Bias Presence Penalty Max Tokens Top P Min P

Model Pricing

Current Pricing

Feature	Price (per 1M tokens)
Prompt	$0.1
Completion	$0.2

Price History

Available Endpoints

Provider	Endpoint Name	Context Length	Pricing (Input)	Pricing (Output)
Parasail	Parasail \| allenai/molmo-7b-d-0924	4K	$0.1 / 1M tokens	$0.2 / 1M tokens

Benchmark Results

Benchmark	Category	Reasoning	Strategy	Free	Executions	Accuracy	Cost	Duration

Other Models by allenai

	Released	Params	Context	Filter by Modalities All Modalities	Speed	Ability	Cost
AllenAI: Olmo 2 32B Instruct	Mar 14, 2025	32B	4K	Text input Text output	★★★★	★★	$$$$