Arcee AI: Spotlight

Text input Image input Text output
Author's Description

Spotlight is a 7‑billion‑parameter vision‑language model derived from Qwen 2.5‑VL and fine‑tuned by Arcee AI for tight image‑text grounding tasks. It offers a 32 k‑token context window, enabling rich multimodal conversations that combine lengthy documents with one or more images. Training emphasized fast inference on consumer GPUs while retaining strong captioning, visual‐question‑answering, and diagram‑analysis accuracy. As a result, Spotlight slots neatly into agent workflows where screenshots, charts or UI mock‑ups need to be interpreted on the fly. Early benchmarks show it matching or out‑scoring larger VLMs such as LLaVA‑1.6 13 B on popular VQA and POPE alignment tests.

Key Specifications
Cost
$$
Context
131K
Parameters
7B (Rumoured)
Released
May 05, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Logit Bias Stop Min P Top P Max Tokens Frequency Penalty Temperature Presence Penalty
Performance Summary

Arcee AI's Spotlight model, a 7-billion-parameter vision-language model, demonstrates strong performance in specific areas, particularly in speed and reliability. It consistently ranks among the fastest models, achieving the 90th percentile across eight benchmarks, and notably secured the #1 spot in Email Classification and top 3 in Mathematics and Coding for speed. The model also offers competitive pricing, typically providing cost-effective solutions at the 74th percentile. Reliability is exceptional, with a 99% success rate across all benchmarks, indicating minimal technical failures. In terms of benchmark performance, Spotlight excels in Ethics (99.0% accuracy) and shows solid performance in General Knowledge (89.1%) and Email Classification (93.0%). Its core strength lies in its design for tight image-text grounding tasks, making it suitable for agent workflows requiring interpretation of visual data. However, the model exhibits weaknesses in Hallucinations (86.0% accuracy, 29th percentile), General Knowledge (30th percentile), and Reasoning (50.0% accuracy, 33rd percentile), suggesting areas for improvement in handling fictional concepts and complex logical inference. Despite these, its fast inference on consumer GPUs and strong visual-question-answering capabilities make it a compelling option for multimodal applications.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.18
Completion $0.18

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Together
Together | arcee-ai/spotlight 131K $0.18 / 1M tokens $0.18 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by arcee-ai