Author's Description
MAI-DS-R1 is a post-trained variant of DeepSeek-R1 developed by the Microsoft AI team to improve the model’s responsiveness on previously blocked topics while enhancing its safety profile. Built on top of DeepSeek-R1’s reasoning foundation, it integrates 110k examples from the Tulu-3 SFT dataset and 350k internally curated multilingual safety-alignment samples. The model retains strong reasoning, coding, and problem-solving capabilities, while unblocking a wide range of prompts previously restricted in R1. MAI-DS-R1 demonstrates improved performance on harm mitigation benchmarks and maintains competitive results across general reasoning tasks. It surpasses R1-1776 in satisfaction metrics for blocked queries and reduces leakage in harmful content categories. The model is based on a transformer MoE architecture and is suitable for general-purpose use cases, excluding high-stakes domains such as legal, medical, or autonomous systems.
Key Specifications
Supported Parameters
This model supports the following parameters:
Features
This model supports the following features:
Performance Summary
Microsoft's MAI-DS-R1 model demonstrates moderate speed performance, ranking in the 35th percentile across various benchmarks, and offers competitive pricing, placing in the 43rd percentile. A standout feature is its exceptional reliability, achieving a 100% success rate across all evaluated benchmarks, indicating minimal technical failures. The model exhibits strong performance across diverse benchmark categories. It achieves perfect accuracy in Instruction Following (Baseline) and Ethics (Baseline), showcasing its precision in adhering to complex directives and ethical reasoning. Notably, it is among the most accurate models at its price point for Ethics and General Knowledge, where it also achieves 100% accuracy. MAI-DS-R1 excels in Coding, scoring 95% accuracy, and maintains strong results in Email Classification (99%) and Reasoning (84%). While one Instruction Following benchmark showed 78% accuracy, it still placed in the 91st percentile, indicating robust performance. Its key strengths lie in its high accuracy across critical tasks, particularly instruction following, ethics, and general knowledge, coupled with its remarkable reliability. There are no significant weaknesses identified in its performance metrics, making it a robust general-purpose AI model.
Model Pricing
Current Pricing
Feature | Price (per 1M tokens) |
---|---|
Prompt | $0.2 |
Completion | $0.8 |
Price History
Available Endpoints
Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
---|---|---|---|---|
Chutes
|
Chutes | microsoft/mai-ds-r1 | 163K | $0.2 / 1M tokens | $0.8 / 1M tokens |
Benchmark Results
Benchmark | Category | Reasoning | Free | Executions | Accuracy | Cost | Duration |
---|
Other Models by microsoft
|
Released | Params | Context |
|
Speed | Ability | Cost |
---|---|---|---|---|---|---|---|
Microsoft: Phi 4 Reasoning Plus | May 01, 2025 | ~14B | 32K |
Text input
Text output
|
★ | ★★★ | $$$$ |
Microsoft: Phi 4 Multimodal Instruct | Mar 07, 2025 | ~5.6B | 131K |
Text input
Image input
Text output
|
★★ | ★★ | $$ |
Microsoft: Phi 4 | Jan 09, 2025 | ~14B | 16K |
Text input
Text output
|
★★★★ | ★★★★ | $$ |
Microsoft: Phi-3.5 Mini 128K Instruct | Aug 20, 2024 | ~3.8B | 128K |
Text input
Text output
|
★ | ★★ | $$ |
Microsoft: Phi-3 Mini 128K Instruct | May 25, 2024 | ~3.8B | 128K |
Text input
Text output
|
★★★ | ★★★ | $$ |
Microsoft: Phi-3 Medium 128K Instruct | May 23, 2024 | ~14B | 128K |
Text input
Text output
|
★★ | ★ | $$$$ |
WizardLM-2 8x22B | Apr 15, 2024 | 22B | 65K |
Text input
Text output
|
★★★ | ★★ | $$$ |