Author's Description
Phi-4-reasoning-plus is an enhanced 14B parameter model from Microsoft, fine-tuned from Phi-4 with additional reinforcement learning to boost accuracy on math, science, and code reasoning tasks. It uses the same dense decoder-only transformer architecture as Phi-4, but generates longer, more comprehensive outputs structured into a step-by-step reasoning trace and final answer. While it offers improved benchmark scores over Phi-4-reasoning across tasks like AIME, OmniMath, and HumanEvalPlus, its responses are typically ~50% longer, resulting in higher latency. Designed for English-only applications, it is well-suited for structured reasoning workflows where output quality takes priority over response speed.
Key Specifications
Supported Parameters
This model supports the following parameters:
Features
This model supports the following features:
Performance Summary
Microsoft's Phi-4-reasoning-plus, created on May 1, 2025, is an enhanced 14B parameter model designed for structured reasoning workflows. It exhibits moderate speed performance, ranking in the 25th percentile across benchmarks, indicating it is not among the fastest models available. However, it offers competitive pricing, positioned at the 45th percentile. A standout feature is its exceptional reliability, achieving a perfect 100th percentile, meaning it consistently provides usable responses with minimal technical failures. The model demonstrates remarkable accuracy in specific domains. It achieved perfect 100% accuracy in Ethics (Baseline), making it the most accurate model at its price point and speed. Its Reasoning (Baseline) performance is also strong at 96.0% accuracy, placing it in the 91st percentile. In Coding (Baseline), it performs well with 89.0% accuracy (77th percentile). However, a significant weakness is its Instruction Following (Baseline) capability, where it scored only 12.1% accuracy, ranking in the 21st percentile. Email Classification (Baseline) also shows a lower accuracy of 88.0% (13th percentile), despite being cost-effective and fast for this task. General Knowledge (Baseline) is solid at 95.5% accuracy. While its longer, step-by-step outputs enhance quality for complex tasks, this contributes to its moderate speed.
Model Pricing
Current Pricing
Feature | Price (per 1M tokens) |
---|---|
Prompt | $0.07 |
Completion | $0.35 |
Price History
Available Endpoints
Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
---|---|---|---|---|
DeepInfra
|
DeepInfra | microsoft/phi-4-reasoning-plus-04-30 | 32K | $0.07 / 1M tokens | $0.35 / 1M tokens |
Benchmark Results
Benchmark | Category | Reasoning | Free | Executions | Accuracy | Cost | Duration |
---|
Other Models by microsoft
|
Released | Params | Context |
|
Speed | Ability | Cost |
---|---|---|---|---|---|---|---|
Microsoft: MAI DS R1 | Apr 20, 2025 | — | 163K |
Text input
Text output
|
★★★★ | ★★★★★ | $$ |
Microsoft: Phi 4 Multimodal Instruct | Mar 07, 2025 | ~5.6B | 131K |
Text input
Image input
Text output
|
★★ | ★★ | $$ |
Microsoft: Phi 4 | Jan 09, 2025 | ~14B | 16K |
Text input
Text output
|
★★★★ | ★★★★ | $$ |
Microsoft: Phi-3.5 Mini 128K Instruct | Aug 20, 2024 | ~3.8B | 128K |
Text input
Text output
|
★ | ★★ | $$ |
Microsoft: Phi-3 Mini 128K Instruct | May 25, 2024 | ~3.8B | 128K |
Text input
Text output
|
★★★ | ★★★ | $$ |
Microsoft: Phi-3 Medium 128K Instruct | May 23, 2024 | ~14B | 128K |
Text input
Text output
|
★★ | ★ | $$$$ |
WizardLM-2 8x22B | Apr 15, 2024 | 22B | 65K |
Text input
Text output
|
★★★ | ★★ | $$$ |