ByteDance: UI-TARS 7B

Text input Image input Text output
Author's Description

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...

Key Specifications
Cost
$$
Context
128K
Parameters
7B
Released
Jul 22, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Frequency Penalty Presence Penalty Seed Logit Bias Top P Max Tokens Stop Temperature
Performance Summary

ByteDance's UI-TARS 7B, a multimodal vision-language agent optimized for GUI-based environments, demonstrates a strong overall performance profile, particularly in operational efficiency. The model performs among the fastest models, ranking in the 76th percentile for speed across benchmarks. It also offers highly competitive pricing, consistently placing in the 82nd percentile. Notably, UI-TARS 7B exhibits exceptional reliability, achieving a 99% success rate, indicating minimal technical failures and consistent response delivery. While excelling in its core domain of GUI interaction, as evidenced by state-of-the-art results on OSworld, WebVoyager, AndroidWorld, ScreenSpot, Poki games, and Minecraft, its performance on general baseline benchmarks is mixed. It shows strong accuracy in Ethics (99.0%) and General Knowledge (93.0%), though its General Knowledge ranking is in the lower 32nd percentile. However, it struggles significantly with Hallucinations (64.0% accuracy, 15th percentile), Instruction Following (32.3% accuracy, 28th percentile), and Reasoning (36.0% accuracy, 20th percentile). Its Email Classification accuracy (82.0%) is also notably low at the 8th percentile. Despite these areas for improvement in general cognitive tasks, its specialized capabilities for GUI interaction, coupled with its speed, cost-effectiveness, and high reliability, position it as a robust solution for its intended application.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.1
Completion $0.2
Input Cache Read $0.1

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Parasail
Parasail | bytedance/ui-tars-1.5-7b 128K $0.1 / 1M tokens $0.2 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by bytedance