Inception: Mercury

Text input Text output
Description

Mercury is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like GPT-4.1 Nano and Claude 3.5 Haiku while matching their performance. Mercury's speed enables developers to provide responsive user experiences, including with voice agents, search interfaces, and chatbots. Read more in the blog post here.

Key Specifications
Context Length

32K

Parameters

Unknown

Created

Jun 26, 2025

Supported Parameters

This model supports the following parameters:

Stop Presence Penalty Max Tokens Frequency Penalty
Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $10
Completion $10

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Inception
Inception | inception/mercury 32K $10 / 1M tokens $10 / 1M tokens
Benchmark Performance Summary
Benchmark Category Reasoning Free Executions Accuracy Cost Duration