Inception: Mercury

Text input Text output
Author's Description

Mercury is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like GPT-4.1 Nano and Claude 3.5 Haiku while matching their performance. Mercury's speed enables developers to provide responsive user experiences, including with voice agents, search interfaces, and chatbots. Read more in the blog post here.

Key Specifications
Cost
$$$$$
Context
128K
Released
Jun 26, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Tools Structured Outputs Tool Choice Response Format Stop Top P Max Tokens Frequency Penalty Presence Penalty Temperature
Features

This model supports the following features:

Tools Response Format Structured Outputs
Performance Summary

Inception: Mercury, the first diffusion large language model (dLLM), demonstrates exceptional speed and competitive pricing. It consistently ranks among the fastest models, achieving Infinityth percentile across 8 benchmarks, and offers among the most competitive pricing, also at the Infinityth percentile across 8 benchmarks. Mercury exhibits strong performance in several key areas. It achieves near-perfect accuracy in Ethics (100%), ranking in the top 3 for speed and being the most accurate model at its price point. In Hallucinations (Baseline), it shows high accuracy (98.0%), placing in the top 3 for speed and being the most accurate among models of comparable speed. General Knowledge (Baseline) also highlights its speed, achieving 97.5% accuracy and ranking #1 in speed. Coding (Baseline) is another strength, with 82.0% accuracy and a #1 speed ranking. However, the model shows significant weaknesses in Mathematics (Baseline) and Reasoning (Baseline), where it scored 0.0% accuracy in both. Email Classification (Baseline) and Instruction Following (Baseline) show moderate performance at 96.0% and 47.0% accuracy respectively, with the former being noted for its speed. Overall, Mercury excels in speed-sensitive applications and ethical considerations, but its current capabilities in complex mathematical and reasoning tasks are limited.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.25
Completion $1

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Inception
Inception | inception/mercury 128K $0.25 / 1M tokens $1 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by inception