Microsoft: Phi 4 Reasoning Plus

Text input Text output
Author's Description

Phi-4-reasoning-plus is an enhanced 14B parameter model from Microsoft, fine-tuned from Phi-4 with additional reinforcement learning to boost accuracy on math, science, and code reasoning tasks. It uses the same dense decoder-only transformer architecture as Phi-4, but generates longer, more comprehensive outputs structured into a step-by-step reasoning trace and final answer. While it offers improved benchmark scores over Phi-4-reasoning across tasks like AIME, OmniMath, and HumanEvalPlus, its responses are typically ~50% longer, resulting in higher latency. Designed for English-only applications, it is well-suited for structured reasoning workflows where output quality takes priority over response speed.

Key Specifications
Cost
$$$$
Context
32K
Parameters
14B (Rumoured)
Released
May 01, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Reasoning Include Reasoning Response Format Stop Seed Min P Top P Max Tokens Frequency Penalty Temperature Presence Penalty
Features

This model supports the following features:

Reasoning Response Format
Performance Summary

Microsoft's Phi 4 Reasoning Plus, an enhanced 14B parameter model, demonstrates moderate speed performance, ranking in the 27th percentile across benchmarks. It offers competitive pricing, placing in the 47th percentile. Notably, the model exhibits exceptional reliability with a 99% success rate, indicating minimal technical failures. In terms of benchmark performance, Phi 4 Reasoning Plus shows a mixed profile. It excels in Ethics, achieving perfect 100% accuracy, making it the most accurate model at its price point and among models of similar speed. Its Coding performance is strong at 89.0% accuracy, placing it in the 69th percentile. General Knowledge is solid at 95.5% accuracy. However, the model struggles significantly with Instruction Following, achieving only 12.1% accuracy, and shows lower accuracy in Email Classification at 88.0%. Its key strength lies in its robust reasoning capabilities, particularly in ethical scenarios and coding, while its primary weakness is its limited instruction following precision. Designed for English-only structured reasoning workflows, its longer, more comprehensive outputs contribute to higher latency but prioritize output quality.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.07
Completion $0.35

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
DeepInfra
DeepInfra | microsoft/phi-4-reasoning-plus-04-30 32K $0.07 / 1M tokens $0.35 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by microsoft