Author's Description
UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...
Key Specifications
Supported Parameters
This model supports the following parameters:
Performance Summary
ByteDance's UI-TARS 7B, a multimodal vision-language agent optimized for GUI-based environments, demonstrates a strong overall performance profile, particularly in operational efficiency. The model performs among the fastest models, ranking in the 76th percentile for speed across benchmarks. It also offers highly competitive pricing, consistently placing in the 82nd percentile. Notably, UI-TARS 7B exhibits exceptional reliability, achieving a 99% success rate, indicating minimal technical failures and consistent response delivery. While excelling in its core domain of GUI interaction, as evidenced by state-of-the-art results on OSworld, WebVoyager, AndroidWorld, ScreenSpot, Poki games, and Minecraft, its performance on general baseline benchmarks is mixed. It shows strong accuracy in Ethics (99.0%) and General Knowledge (93.0%), though its General Knowledge ranking is in the lower 32nd percentile. However, it struggles significantly with Hallucinations (64.0% accuracy, 15th percentile), Instruction Following (32.3% accuracy, 28th percentile), and Reasoning (36.0% accuracy, 20th percentile). Its Email Classification accuracy (82.0%) is also notably low at the 8th percentile. Despite these areas for improvement in general cognitive tasks, its specialized capabilities for GUI interaction, coupled with its speed, cost-effectiveness, and high reliability, position it as a robust solution for its intended application.
Model Pricing
Current Pricing
| Feature | Price (per 1M tokens) |
|---|---|
| Prompt | $0.1 |
| Completion | $0.2 |
| Input Cache Read | $0.1 |
Price History
Available Endpoints
| Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
|---|---|---|---|---|
|
Parasail
|
Parasail | bytedance/ui-tars-1.5-7b | 128K | $0.1 / 1M tokens | $0.2 / 1M tokens |
Benchmark Results
| Benchmark | Category | Reasoning | Strategy | Free | Executions | Accuracy | Cost | Duration |
|---|
Other Models by bytedance
|
|
Released | Params | Context |
|
Speed | Ability | Cost |
|---|---|---|---|---|---|---|---|
| ByteDance: Seed OSS 36B Instruct Unavailable | Sep 02, 2025 | 36B | 131K |
Text input
Text output
|
★★ | ★★ | $$$$ |