AllenAI: Molmo2 8B

Text input Image input Video input Text output Unavailable
Author's Description

Molmo2-8B is an open vision-language model developed by the Allen Institute for AI (Ai2) as part of the Molmo2 family, supporting image, video, and multi-image understanding and grounding. It is based on Qwen3-8B and uses SigLIP 2 as its vision backbone, outperforming other open-weight, open-data models on short videos, counting, and captioning, while remaining competitive on long-video tasks.

Key Specifications
Context
36K
Parameters
8B
Released
Jan 09, 2026
Supported Parameters

This model supports the following parameters:

Top P Seed Min P Temperature Stop Max Tokens Presence Penalty Logit Bias Frequency Penalty
Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0
Completion $0

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Parasail
Parasail | allenai/molmo-2-8b-20260109 36K $0 / 1M tokens $0 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by allenai