Author's Description
UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces. This model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints.
Key Specifications
Supported Parameters
This model supports the following parameters:
Performance Summary
Bytedance: UI-TARS 7B, created on July 22, 2025, is a multimodal vision-language agent optimized for GUI-based environments. This model consistently performs among the fastest models, ranking in the 82nd percentile for speed, and offers highly competitive pricing, placing in the 81st percentile. Its reliability is exceptional, demonstrating minimal technical failures and ranking in the 99th percentile. In terms of benchmark performance, UI-TARS 7B exhibits a mixed profile. It shows a notable strength in Ethics, achieving 99.0% accuracy, placing it in the 64th percentile, with efficient cost and duration. General Knowledge also stands out with 93.0% accuracy, though its cost and duration are higher. However, the model demonstrates significant weaknesses in Instruction Following (32.3% accuracy) and Reasoning (36.0% accuracy), where its accuracy falls into the lower percentiles. Coding performance is moderate at 68.0% accuracy, but with high duration. Email Classification, despite a high accuracy of 82.0%, ranks poorly in its category (11th percentile), suggesting this is a less competitive area for the model. Overall, UI-TARS 7B excels in reliability and cost-efficiency, while its accuracy varies significantly across different cognitive tasks, particularly struggling with complex instruction following and reasoning.
Model Pricing
Current Pricing
Feature | Price (per 1M tokens) |
---|---|
Prompt | $0.1 |
Completion | $0.2 |
Price History
Available Endpoints
Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
---|---|---|---|---|
Parasail
|
Parasail | bytedance/ui-tars-1.5-7b | 128K | $0.1 / 1M tokens | $0.2 / 1M tokens |
Benchmark Results
Benchmark | Category | Reasoning | Free | Executions | Accuracy | Cost | Duration |
---|