Microsoft: Phi-3 Medium 128K Instruct

Text input Text output
Author's Description

Phi-3 128K Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjustments, it excels in tasks involving common sense, mathematics, logical reasoning, and code processing. At time of release, Phi-3 Medium demonstrated state-of-the-art performance among lightweight models. In the MMLU-Pro eval, the model even comes close to a Llama3 70B level of performance. For 4k context length, try [Phi-3 Medium 4K](/models/microsoft/phi-3-medium-4k-instruct).

Key Specifications
Cost
$$$$
Context
128K
Parameters
14B (Rumoured)
Released
May 23, 2024
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Tools Tool Choice Top P Max Tokens Temperature
Features

This model supports the following features:

Tools
Performance Summary

Microsoft's Phi-3 Medium 128K Instruct, released on May 23, 2024, is positioned as a powerful 14-billion parameter model optimized for advanced language understanding and reasoning. It consistently ranks among the fastest models and offers highly competitive pricing across all evaluated benchmarks. Despite its strong foundational claims and reported performance close to Llama3 70B in MMLU-Pro, the provided benchmark results indicate significant limitations in its current evaluated performance. The model achieved 0.0% accuracy across General Knowledge, Ethics, Instruction Following, Reasoning, and Coding benchmarks, suggesting a fundamental challenge in accurately addressing these diverse tasks. In Email Classification, it achieved a low 4.0% accuracy, placing it in the 4th percentile. While its speed and cost efficiency are exceptional, the lack of accurate responses across all tested categories is a critical weakness. This suggests that while the model is highly efficient in processing, its ability to generate correct or relevant outputs for these specific benchmarks is currently underdeveloped. Further evaluation with more detailed metrics beyond baseline accuracy would be beneficial to understand its true capabilities and identify areas for improvement.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $1
Completion $1

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Azure
Azure | microsoft/phi-3-medium-128k-instruct 128K $1 / 1M tokens $1 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by microsoft