DeepSeek: R1 Distill Llama 8B

Text input Text output
Author's Description

DeepSeek R1 Distill Llama 8B is a distilled large language model based on [Llama-3.1-8B-Instruct](/meta-llama/llama-3.1-8b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including: - AIME 2024 pass@1: 50.4 - MATH-500 pass@1: 89.1 - CodeForces Rating: 1205 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models. Hugging Face: - [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) - [DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) |

Key Specifications
Cost
$$
Context
32K
Parameters
8B
Released
Feb 07, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Seed Max Tokens Presence Penalty Frequency Penalty Logit Bias Include Reasoning Temperature Top P Stop Min P Reasoning
Features

This model supports the following features:

Reasoning
Performance Summary

DeepSeek R1 Distill Llama 8B, a distilled large language model based on Llama-3.1-8B-Instruct, demonstrates a compelling balance of performance and cost-efficiency. While its speed ranking places it among the slower models (8th percentile), it consistently offers highly competitive pricing (84th percentile), making it a cost-effective option for various applications. In terms of benchmark performance, the model exhibits notable strengths in specific areas. It achieves a respectable 81.0% accuracy in the "Coding (Baseline)" benchmark, placing it in the 56th percentile, and performs well in "Reasoning (Baseline)" with 66.0% accuracy (62nd percentile). These results, coupled with impressive AIME 2024 pass@1 (50.4) and MATH-500 pass@1 (89.1) scores, highlight its capabilities in complex problem-solving and technical domains. However, the model shows weaknesses in "Ethics (Baseline)" and "Email Classification (Baseline)," with accuracy scores of 74.0% (22nd percentile) and 92.0% (22nd percentile) respectively. Its "General Knowledge (Baseline)" performance is also modest at 72.8% accuracy (25th percentile). Despite these areas for improvement, its overall competitive performance, particularly in coding and reasoning, combined with its cost-effectiveness, positions DeepSeek R1 Distill Llama 8B as a valuable option for applications where budget and specific technical proficiencies are key considerations.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.04
Completion $0.04

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Novita
Novita | deepseek/deepseek-r1-distill-llama-8b 32K $0.04 / 1M tokens $0.04 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by deepseek