DeepSeek: R1 Distill Llama 8B

Name: DeepSeek: R1 Distill Llama 8B
Brand: deepseek
Price: 4e-8 USD
Availability: InStock
Rating: 2.6 (5 reviews)

Back

Text input Text output

Author's Description

DeepSeek R1 Distill Llama 8B is a distilled large language model based on [Llama-3.1-8B-Instruct](/meta-llama/llama-3.1-8b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including: - AIME 2024 pass@1: 50.4 - MATH-500 pass@1: 89.1 - CodeForces Rating: 1205 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models. Hugging Face: - [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) - [DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) |

Key Specifications

Cost

Context

32K

Parameters

Released

Feb 07, 2025

Speed

★

Ability

★★★

Reliability

★★★★

Hugging Face

Supported Parameters

This model supports the following parameters:

Seed Max Tokens Presence Penalty Frequency Penalty Logit Bias Include Reasoning Temperature Top P Stop Min P Reasoning

Features

This model supports the following features:

Reasoning

Performance Summary

DeepSeek R1 Distill Llama 8B, a distilled large language model based on Llama-3.1-8B-Instruct, demonstrates a compelling balance of performance and cost-efficiency. While its speed ranking places it among the slower models (8th percentile), it consistently offers highly competitive pricing (84th percentile), making it a cost-effective option for various applications. In terms of benchmark performance, the model exhibits notable strengths in specific areas. It achieves a respectable 81.0% accuracy in the "Coding (Baseline)" benchmark, placing it in the 56th percentile, and performs well in "Reasoning (Baseline)" with 66.0% accuracy (62nd percentile). These results, coupled with impressive AIME 2024 pass@1 (50.4) and MATH-500 pass@1 (89.1) scores, highlight its capabilities in complex problem-solving and technical domains. However, the model shows weaknesses in "Ethics (Baseline)" and "Email Classification (Baseline)," with accuracy scores of 74.0% (22nd percentile) and 92.0% (22nd percentile) respectively. Its "General Knowledge (Baseline)" performance is also modest at 72.8% accuracy (25th percentile). Despite these areas for improvement, its overall competitive performance, particularly in coding and reasoning, combined with its cost-effectiveness, positions DeepSeek R1 Distill Llama 8B as a valuable option for applications where budget and specific technical proficiencies are key considerations.

Model Pricing

Current Pricing

Feature	Price (per 1M tokens)
Prompt	$0.04
Completion	$0.04

Price History

Available Endpoints

Provider	Endpoint Name	Context Length	Pricing (Input)	Pricing (Output)
Novita	Novita \| deepseek/deepseek-r1-distill-llama-8b	32K	$0.04 / 1M tokens	$0.04 / 1M tokens

Benchmark Results

Benchmark	Category	Reasoning	Free	Executions	Accuracy	Cost	Duration

Other Models by deepseek

	Released	Params	Context	Filter by Modalities All Modalities	Speed	Ability	Cost
DeepSeek: R1 Distill Qwen 7B	May 30, 2025	7B	131K	Text input Text output	★	★	$$$$
DeepSeek: Deepseek R1 0528 Qwen3 8B	May 29, 2025	8B	131K	Text input Text output	★	★★★★★	$$$
DeepSeek: R1 0528	May 28, 2025	~671B	128K	Text input Text output	★	★★★★★	$$$$$
DeepSeek: DeepSeek Prover V2	Apr 30, 2025	~671B	131K	Text input Text output	★★★★	★★★★★	$$$$
DeepSeek: DeepSeek V3 0324	Mar 24, 2025	~685B	163K	Text input Text output	★★★	★★★★★	$$$
DeepSeek: R1 Distill Qwen 1.5B	Jan 31, 2025	5B	131K	Text input Text output	★★★	★	$$$
DeepSeek: R1 Distill Qwen 32B	Jan 29, 2025	32B	131K	Text input Text output	★	★★★★★	$$$
DeepSeek: R1 Distill Qwen 14B	Jan 29, 2025	14B	64K	Text input Text output	★	★★★	$$$
DeepSeek: R1 Distill Llama 70B	Jan 23, 2025	70B	131K	Text input Text output	★	★★★★★	$$$$
DeepSeek: R1	Jan 20, 2025	~671B	128K	Text input Text output	★★	★★★★	$$$$
DeepSeek: DeepSeek V3	Dec 26, 2024	—	163K	Text input Text output	★★★	★★★★	$$$