Microsoft: Phi 4 Reasoning Plus

Text input Text output
Author's Description

Phi-4-reasoning-plus is an enhanced 14B parameter model from Microsoft, fine-tuned from Phi-4 with additional reinforcement learning to boost accuracy on math, science, and code reasoning tasks. It uses the same dense decoder-only transformer architecture as Phi-4, but generates longer, more comprehensive outputs structured into a step-by-step reasoning trace and final answer. While it offers improved benchmark scores over Phi-4-reasoning across tasks like AIME, OmniMath, and HumanEvalPlus, its responses are typically ~50% longer, resulting in higher latency. Designed for English-only applications, it is well-suited for structured reasoning workflows where output quality takes priority over response speed.

Key Specifications
Cost
$$$$
Context
32K
Parameters
14B (Rumoured)
Released
May 01, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Include Reasoning Stop Presence Penalty Top P Temperature Seed Min P Response Format Reasoning Frequency Penalty Max Tokens
Features

This model supports the following features:

Reasoning Response Format
Performance Summary

Microsoft's Phi-4-reasoning-plus, created on May 1, 2025, is an enhanced 14B parameter model designed for structured reasoning workflows. It exhibits moderate speed performance, ranking in the 25th percentile across benchmarks, indicating it is not among the fastest models available. However, it offers competitive pricing, positioned at the 45th percentile. A standout feature is its exceptional reliability, achieving a perfect 100th percentile, meaning it consistently provides usable responses with minimal technical failures. The model demonstrates remarkable accuracy in specific domains. It achieved perfect 100% accuracy in Ethics (Baseline), making it the most accurate model at its price point and speed. Its Reasoning (Baseline) performance is also strong at 96.0% accuracy, placing it in the 91st percentile. In Coding (Baseline), it performs well with 89.0% accuracy (77th percentile). However, a significant weakness is its Instruction Following (Baseline) capability, where it scored only 12.1% accuracy, ranking in the 21st percentile. Email Classification (Baseline) also shows a lower accuracy of 88.0% (13th percentile), despite being cost-effective and fast for this task. General Knowledge (Baseline) is solid at 95.5% accuracy. While its longer, step-by-step outputs enhance quality for complex tasks, this contributes to its moderate speed.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.07
Completion $0.35

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
DeepInfra
DeepInfra | microsoft/phi-4-reasoning-plus-04-30 32K $0.07 / 1M tokens $0.35 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by microsoft