DeepSeek: R1 Distill Llama 8B

Text input Text output Unavailable
Author's Description

DeepSeek R1 Distill Llama 8B is a distilled large language model based on [Llama-3.1-8B-Instruct](/meta-llama/llama-3.1-8b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including: - AIME 2024 pass@1: 50.4 - MATH-500 pass@1: 89.1 - CodeForces Rating: 1205 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models. Hugging Face: - [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) - [DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) |

Key Specifications
Cost
$$
Context
32K
Parameters
8B
Released
Feb 07, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Include Reasoning Stop Max Tokens Top P Frequency Penalty Reasoning Logit Bias Min P Seed Temperature Presence Penalty
Features

This model supports the following features:

Reasoning
Performance Summary

DeepSeek R1 Distill Llama 8B, created on February 7, 2025, is a distilled large language model leveraging outputs from DeepSeek R1 and based on Llama-3.1-8B-Instruct. This model exhibits a strong focus on cost-efficiency, consistently offering among the most competitive pricing, ranking in the 80th percentile across seven benchmarks. Its reliability is also a notable strength, with an 88% success rate, indicating few technical issues. However, the model tends to have longer response times, placing it in the 10th percentile for speed. Performance across benchmarks is varied. It demonstrates impressive capabilities in specialized areas like AIME 2024 (50.4 pass@1) and MATH-500 (89.1 pass@1), alongside a respectable CodeForces Rating of 1205. In general benchmarks, its Coding accuracy is solid at 81.0% (48th percentile), and Reasoning is moderate at 57.1% (41st percentile). Weaknesses are apparent in Hallucinations (50.0% accuracy) and Instruction Following (25.3% accuracy), where it ranks in the lower percentiles. General Knowledge and Ethics also show room for improvement, with accuracies of 72.8% and 74.0% respectively, placing them in the lower 20th percentiles.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.04
Completion $0.04

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Novita
Novita | deepseek/deepseek-r1-distill-llama-8b 32K $0.04 / 1M tokens $0.04 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by deepseek