Author's Description
UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces. This model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints.
Key Specifications
Supported Parameters
This model supports the following parameters:
Performance Summary
ByteDance: UI-TARS 7B demonstrates strong overall performance, particularly in operational efficiency. It performs among the fastest models, ranking in the top tier for speed (75th percentile), and consistently offers highly competitive pricing (81st percentile). The model also exhibits exceptional reliability with a 99% success rate across benchmarks, indicating minimal technical failures. In terms of specific benchmark performance, UI-TARS 7B shows notable strengths in ethical reasoning and general knowledge, achieving 99.0% and 93.0% accuracy respectively. It also performs well in mathematics (78.0%) and coding (68.0%). However, the model exhibits significant weaknesses in hallucination mitigation, with only 64.0% accuracy, suggesting a tendency to generate responses for fictional concepts. Its performance in email classification (82.0%), instruction following (32.3%), and complex reasoning (36.0%) is also below average compared to other models. Despite these areas for improvement, its core strength lies in its multimodal vision-language agent capabilities for GUI-based environments, as highlighted by its state-of-the-art results on interactive and grounding benchmarks like OSworld and WebVoyager, and perfect task completion in Poki games.
Model Pricing
Current Pricing
| Feature | Price (per 1M tokens) |
|---|---|
| Prompt | $0.1 |
| Completion | $0.2 |
Price History
Available Endpoints
| Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
|---|---|---|---|---|
|
Parasail
|
Parasail | bytedance/ui-tars-1.5-7b | 128K | $0.1 / 1M tokens | $0.2 / 1M tokens |
Benchmark Results
| Benchmark | Category | Reasoning | Strategy | Free | Executions | Accuracy | Cost | Duration |
|---|
Other Models by bytedance
|
|
Released | Params | Context |
|
Speed | Ability | Cost |
|---|---|---|---|---|---|---|---|
| ByteDance: Seed OSS 36B Instruct Unavailable | Sep 02, 2025 | 36B | 131K |
Text input
Text output
|
★ | ★★★ | $$$$ |