Author's Description
Reka Flash 3 is a general-purpose, instruction-tuned large language model with 21 billion parameters, developed by Reka. It excels at general chat, coding tasks, instruction-following, and function calling. Featuring a 32K context length and optimized through reinforcement learning (RLOO), it provides competitive performance comparable to proprietary models within a smaller parameter footprint. Ideal for low-latency, local, or on-device deployments, Reka Flash 3 is compact, supports efficient quantization (down to 11GB at 4-bit precision), and employs explicit reasoning tags ("<reasoning>") to indicate its internal thought process. Reka Flash 3 is primarily an English model with limited multilingual understanding capabilities. The model weights are released under the Apache 2.0 license.
Key Specifications
Supported Parameters
This model supports the following parameters:
Features
This model supports the following features:
Performance Summary
Reka Flash 3, a 21-billion parameter instruction-tuned LLM, demonstrates exceptional speed, consistently ranking among the fastest models across all benchmarks. It also offers highly competitive pricing, placing in the 94th percentile. Reliability is a significant strength, with a perfect 100% success rate across all evaluated benchmarks, indicating robust technical stability. Performance across benchmarks reveals a mixed but generally strong profile. The model achieved perfect 100% accuracy in Email Classification, notably being the most accurate model at its price point and among models of comparable speed. It also performed well in General Knowledge, securing 96% accuracy, and Coding, with 89% accuracy, placing it in the 71st percentile for coding proficiency. However, a significant weakness is observed in Instruction Following (Baseline), where it recorded 0.0% accuracy, suggesting a critical area for improvement despite its instruction-tuned nature. Its explicit reasoning tags ("<reasoning>") are a unique feature for transparency. While primarily an English model, its compact size and efficient quantization make it suitable for low-latency and on-device deployments.
Model Pricing
Current Pricing
| Feature | Price (per 1M tokens) |
|---|---|
| Prompt | $0.013 |
| Completion | $0.013 |
Price History
Available Endpoints
| Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
|---|---|---|---|---|
|
Chutes
|
Chutes | rekaai/reka-flash-3 | 32K | $0.013 / 1M tokens | $0.013 / 1M tokens |
Benchmark Results
| Benchmark | Category | Reasoning | Strategy | Free | Executions | Accuracy | Cost | Duration |
|---|
Other Models by rekaai
|
|
Released | Params | Context |
|
Speed | Ability | Cost |
|---|---|---|---|---|---|---|---|
| Reka Edge | Mar 20, 2026 | ~7B | 16K |
Text input
Video input
Image input
Text output
|
★★★★ | ★ | $$ |