DeepSeek: R1 Distill Llama 70B

Text input Text output Free Option
Author's Description

DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including: - AIME 2024 pass@1: 70.0 - MATH-500 pass@1: 94.5 - CodeForces Rating: 1633 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

Key Specifications
Cost
$$
Context
131K
Parameters
70B
Released
Jan 23, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Stop Top P Seed Min P Frequency Penalty Response Format Max Tokens Reasoning Presence Penalty Include Reasoning Temperature
Features

This model supports the following features:

Response Format Reasoning
Performance Summary

DeepSeek R1 Distill Llama 70B, a distilled model based on Llama-3.3-70B-Instruct, demonstrates a strong overall performance profile. While its speed performance is moderate, ranking in the 20th percentile, it offers competitive pricing, placing in the 50th percentile across benchmarks. A standout feature is its exceptional reliability, boasting a 97% success rate, indicating consistent and dependable operation with minimal technical failures. The model excels in Instruction Following, achieving perfect accuracy in one benchmark, indicating precise adherence to complex directives. It also shows strong capabilities in General Knowledge (99.8% accuracy) and Reasoning (84.0% accuracy), performing well above average in these categories. Its performance in Ethics is solid at 99.0% accuracy, though its duration for this benchmark is notably long. Coding performance is respectable at 87.0% accuracy. A notable weakness appears in another Instruction Following benchmark, where accuracy drops to 58.0%, suggesting some variability or specific challenges in certain instruction sets. Overall, the model leverages its distillation effectively to deliver competitive performance, particularly in accuracy-critical tasks and general knowledge, making it a reliable choice for diverse applications.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.1
Completion $0.4

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
DeepInfra
DeepInfra | deepseek/deepseek-r1-distill-llama-70b 131K $0.1 / 1M tokens $0.4 / 1M tokens
InferenceNet
InferenceNet | deepseek/deepseek-r1-distill-llama-70b 128K $0.0259 / 1M tokens $0.104 / 1M tokens
Lambda
Lambda | deepseek/deepseek-r1-distill-llama-70b 131K $0.2 / 1M tokens $0.6 / 1M tokens
Phala
Phala | deepseek/deepseek-r1-distill-llama-70b 131K $0.0259 / 1M tokens $0.104 / 1M tokens
GMICloud
GMICloud | deepseek/deepseek-r1-distill-llama-70b 131K $0.0259 / 1M tokens $0.104 / 1M tokens
Nebius
Nebius | deepseek/deepseek-r1-distill-llama-70b 131K $0.25 / 1M tokens $0.75 / 1M tokens
SambaNova
SambaNova | deepseek/deepseek-r1-distill-llama-70b 131K $0.7 / 1M tokens $1.4 / 1M tokens
Groq
Groq | deepseek/deepseek-r1-distill-llama-70b 131K $0.75 / 1M tokens $0.99 / 1M tokens
Novita
Novita | deepseek/deepseek-r1-distill-llama-70b 32K $0.8 / 1M tokens $0.8 / 1M tokens
Together
Together | deepseek/deepseek-r1-distill-llama-70b 131K $2 / 1M tokens $2 / 1M tokens
Cerebras
Cerebras | deepseek/deepseek-r1-distill-llama-70b 32K $0.0259 / 1M tokens $0.104 / 1M tokens
Chutes
Chutes | deepseek/deepseek-r1-distill-llama-70b 131K $0.0259 / 1M tokens $0.104 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by deepseek