Text input
Text output
Description
Mercury is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like GPT-4.1 Nano and Claude 3.5 Haiku while matching their performance. Mercury's speed enables developers to provide responsive user experiences, including with voice agents, search interfaces, and chatbots. Read more in the blog post here.
Key Specifications
Context Length
32K
Parameters
Unknown
Created
Jun 26, 2025
Supported Parameters
This model supports the following parameters:
Stop
Presence Penalty
Max Tokens
Frequency Penalty
Model Pricing
Current Pricing
Feature | Price (per 1M tokens) |
---|---|
Prompt | $10 |
Completion | $10 |
Price History
Available Endpoints
Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
---|---|---|---|---|
Inception
|
Inception | inception/mercury | 32K | $10 / 1M tokens | $10 / 1M tokens |
Benchmark Performance Summary
Benchmark | Category | Reasoning | Free | Executions | Accuracy | Cost | Duration |
---|