Inception: Mercury 2

Text input Text output
Author's Description

Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving...

Key Specifications
Cost
$$$
Context
128K
Released
Mar 04, 2026
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Tools Tool Choice Temperature Include Reasoning Reasoning Max Tokens Structured Outputs Response Format Stop
Features

This model supports the following features:

Structured Outputs Reasoning Response Format Tools
Performance Summary

Inception: Mercury 2, created on March 4, 2026, is positioned as an extremely fast reasoning LLM, notable for being the first reasoning diffusion LLM (dLLM). It achieves over 1,000 tokens/sec by generating and refining tokens in parallel, making it significantly faster than leading speed-optimized models. The model consistently ranks among the fastest, placing in the 95th percentile across eight benchmarks. It offers competitive pricing, ranking in the 52nd percentile, and demonstrates exceptional reliability with a 100% success rate across all benchmarks, indicating consistent and usable responses. Mercury 2 exhibits strong performance in specific areas. It achieved perfect accuracy in Hallucinations (100%), demonstrating an excellent ability to acknowledge uncertainty, and also performed very well in Email Classification (99.0% accuracy) and Reasoning (96.0% accuracy). These categories also highlight its speed, often being the most accurate among models of comparable velocity. However, the model shows notable weaknesses in General Knowledge (7.0% accuracy), Coding (60.0% accuracy), Mathematics (62.0% accuracy), and Ethics (93.0% accuracy, but only 23rd percentile), suggesting limitations in broad factual recall and complex problem-solving beyond its core reasoning strengths. Its Instruction Following accuracy (52.5%) is also below average. Mercury 2's tunable reasoning levels, 128K context, native tool use, and schema-aligned JSON output make it particularly suited for latency-sensitive applications like coding workflows, real-time voice/search, and agent loops.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.25
Completion $0.75
Input Cache Read $0.025

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Inception
Inception | inception/mercury-2-20260304 128K $0.25 / 1M tokens $0.75 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by inception