DeepSeek: R1 Distill Llama 70B

Text input Text output
Author's Description

DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across...

Key Specifications
Cost
$$
Context
131K
Parameters
70B
Released
Jan 23, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Temperature Include Reasoning Reasoning Presence Penalty Max Tokens Seed Min P Response Format Frequency Penalty Top P Stop
Features

This model supports the following features:

Reasoning Response Format
Performance Summary

DeepSeek R1 Distill Llama 70B, created on January 23, 2025, is a distilled large language model with a context length of 131072, leveraging outputs from DeepSeek R1 and based on Llama-3.3-70B-Instruct. The model exhibits moderate speed performance, ranking in the 20th percentile across benchmarks, and offers competitive pricing, placing it in the 51st percentile. Notably, its reliability is exceptional, boasting a 97% success rate across 9 benchmarks, indicating minimal technical failures. In terms of performance, the model demonstrates strong capabilities in General Knowledge (99.8% accuracy, 80th percentile) and Reasoning (84.0% accuracy, 66th percentile). It also achieves perfect accuracy in one instance of Instruction Following, showcasing its ability to precisely adhere to complex directives. Its Coding performance is solid at 87.0% accuracy (55th percentile), and it maintains a high standard in Ethics (99.0% accuracy, 54th percentile). A key strength is its impressive performance on specialized benchmarks like MATH-500 (94.5% pass@1) and AIME 2024 (70.0% pass@1), alongside a CodeForces Rating of 1633, highlighting its mathematical and algorithmic prowess. A notable weakness is its Hallucinations (Baseline) score of 90.0% accuracy, which, while decent, suggests room for improvement in acknowledging uncertainty. Its Mathematics (Baseline) score of 79.0% (34th percentile) also indicates it's not a top performer in general mathematical tests.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.7
Completion $0.8

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
DeepInfra
DeepInfra | deepseek/deepseek-r1-distill-llama-70b 131K $0.7 / 1M tokens $0.8 / 1M tokens
InferenceNet
InferenceNet | deepseek/deepseek-r1-distill-llama-70b 128K $0.7 / 1M tokens $0.8 / 1M tokens
Lambda
Lambda | deepseek/deepseek-r1-distill-llama-70b 131K $0.7 / 1M tokens $0.8 / 1M tokens
Phala
Phala | deepseek/deepseek-r1-distill-llama-70b 131K $0.7 / 1M tokens $0.8 / 1M tokens
GMICloud
GMICloud | deepseek/deepseek-r1-distill-llama-70b 131K $0.7 / 1M tokens $0.8 / 1M tokens
Nebius
Nebius | deepseek/deepseek-r1-distill-llama-70b 131K $0.7 / 1M tokens $0.8 / 1M tokens
SambaNova
SambaNova | deepseek/deepseek-r1-distill-llama-70b 131K $0.7 / 1M tokens $0.8 / 1M tokens
Groq
Groq | deepseek/deepseek-r1-distill-llama-70b 131K $0.7 / 1M tokens $0.8 / 1M tokens
Novita
Novita | deepseek/deepseek-r1-distill-llama-70b 32K $0.7 / 1M tokens $0.8 / 1M tokens
Together
Together | deepseek/deepseek-r1-distill-llama-70b 131K $0.7 / 1M tokens $0.8 / 1M tokens
Cerebras
Cerebras | deepseek/deepseek-r1-distill-llama-70b 32K $0.7 / 1M tokens $0.8 / 1M tokens
Chutes
Chutes | deepseek/deepseek-r1-distill-llama-70b 131K $0.7 / 1M tokens $0.8 / 1M tokens
Chutes
Chutes | deepseek/deepseek-r1-distill-llama-70b 131K $0.7 / 1M tokens $0.8 / 1M tokens
Novita
Novita | deepseek/deepseek-r1-distill-llama-70b 8K $0.8 / 1M tokens $0.8 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by deepseek