Microsoft: Phi-3 Medium 128K Instruct

Text input Text output
Author's Description

Phi-3 128K Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjustments, it excels in tasks involving common sense, mathematics, logical reasoning, and code processing. At time of release, Phi-3 Medium demonstrated state-of-the-art performance among lightweight models. In the MMLU-Pro eval, the model even comes close to a Llama3 70B level of performance. For 4k context length, try [Phi-3 Medium 4K](/models/microsoft/phi-3-medium-4k-instruct).

Key Specifications
Cost
$$$$
Context
128K
Parameters
14B (Rumoured)
Released
May 23, 2024
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Tool Choice Top P Temperature Tools Max Tokens
Features

This model supports the following features:

Tools
Performance Summary

Microsoft's Phi-3 Medium 128K Instruct, released on May 23, 2024, demonstrates exceptional speed and cost efficiency, consistently ranking among the fastest and most competitively priced models across six benchmarks. This 14-billion parameter model is designed for advanced language understanding and reasoning, excelling in common sense, mathematics, logical reasoning, and code processing. While its overall performance on the provided benchmarks shows significant variability, it achieved a strong 76% accuracy in Reasoning, indicating a notable strength in complex problem-solving. However, the model exhibited very low or zero accuracy in Coding, Instruction Following, Ethics, and General Knowledge, suggesting these areas are significant weaknesses. Its 4% accuracy in Email Classification is also quite low. The model's description highlights its state-of-the-art performance among lightweight models at release, even approaching Llama3 70B levels on MMLU-Pro, which contrasts with some of the baseline benchmark results provided. Further investigation into the specific nature of these baseline tests versus the MMLU-Pro evaluation would be beneficial to reconcile these observations.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $1
Completion $1

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Azure
Azure | microsoft/phi-3-medium-128k-instruct 128K $1 / 1M tokens $1 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by microsoft