Microsoft: Phi-3.5 Mini 128K Instruct

Text input Text output
Author's Description

Phi-3.5 models are lightweight, state-of-the-art open models. These models were trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. Phi-3.5 Mini uses 3.8B parameters, and is a dense decoder-only transformer model using the same tokenizer as [Phi-3 Mini](/models/microsoft/phi-3-mini-128k-instruct). The models underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3.5 models showcased robust and state-of-the-art performance among models with less than 13 billion parameters.

Key Specifications
Cost
$$
Context
128K
Parameters
3.8B (Rumoured)
Released
Aug 20, 2024
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Tool Choice Top P Temperature Tools Max Tokens
Features

This model supports the following features:

Tools
Performance Summary

Microsoft's Phi-3.5 Mini 128K Instruct, a lightweight 3.8B parameter model, demonstrates exceptional speed and cost efficiency. It consistently ranks among the fastest models and offers highly competitive pricing across various benchmarks. With an 86th percentile reliability ranking, the model shows strong consistency with few technical issues. In terms of performance across categories, Phi-3.5 Mini exhibits notable strengths in Ethics, achieving perfect 100% accuracy and being recognized as the most accurate model at its price point and speed. It also performs well in Email Classification (89% accuracy) and General Knowledge (68.3% accuracy), though its percentile rankings in these areas are modest. Key weaknesses are apparent in Coding and Instruction Following, where it achieved only 22% and 0% accuracy respectively, indicating significant limitations in these complex domains. Its Reasoning performance, while at 46.9% accuracy, also shows room for improvement. The model's long duration times in Reasoning and General Knowledge benchmarks suggest that while it can process complex queries, it may take longer to do so.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.1
Completion $0.1

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Azure
Azure | microsoft/phi-3.5-mini-128k-instruct 128K $0.1 / 1M tokens $0.1 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by microsoft