Reka: Flash 3

Text input Text output Unavailable
Author's Description

Reka Flash 3 is a general-purpose, instruction-tuned large language model with 21 billion parameters, developed by Reka. It excels at general chat, coding tasks, instruction-following, and function calling. Featuring a 32K context length and optimized through reinforcement learning (RLOO), it provides competitive performance comparable to proprietary models within a smaller parameter footprint. Ideal for low-latency, local, or on-device deployments, Reka Flash 3 is compact, supports efficient quantization (down to 11GB at 4-bit precision), and employs explicit reasoning tags ("<reasoning>") to indicate its internal thought process. Reka Flash 3 is primarily an English model with limited multilingual understanding capabilities. The model weights are released under the Apache 2.0 license.

Key Specifications
Cost
$$
Context
32K
Parameters
21B (Rumoured)
Released
Mar 12, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Stop Frequency Penalty Include Reasoning Seed Min P Top Logprobs Top P Presence Penalty Temperature Logit Bias Reasoning Max Tokens Logprobs
Features

This model supports the following features:

Reasoning
Performance Summary

Reka Flash 3, a 21-billion parameter instruction-tuned LLM, demonstrates exceptional speed, consistently ranking among the fastest models across all benchmarks. It also offers highly competitive pricing, placing in the 94th percentile. Reliability is a significant strength, with a perfect 100% success rate across all evaluated benchmarks, indicating robust technical stability. Performance across benchmarks reveals a mixed but generally strong profile. The model achieved perfect 100% accuracy in Email Classification, notably being the most accurate model at its price point and among models of comparable speed. It also performed well in General Knowledge, securing 96% accuracy, and Coding, with 89% accuracy, placing it in the 71st percentile for coding proficiency. However, a significant weakness is observed in Instruction Following (Baseline), where it recorded 0.0% accuracy, suggesting a critical area for improvement despite its instruction-tuned nature. Its explicit reasoning tags ("<reasoning>") are a unique feature for transparency. While primarily an English model, its compact size and efficient quantization make it suitable for low-latency and on-device deployments.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.013
Completion $0.013

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Chutes
Chutes | rekaai/reka-flash-3 32K $0.013 / 1M tokens $0.013 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by rekaai