Rekaai logo

Rekaai

Reka Flash 3

Reka Flash 3 is a general-purpose, instruction-tuned large language model with 21 billion parameters, developed by Reka. It excels at general chat, coding tasks, instruction-following, and function calling. Featuring a 32K context length and optimized through reinforcement learning (RLOO), it provides competitive performance comparable to proprietary models within a smaller parameter footprint. Ideal for low-latency, local, or on-device deployments, Reka Flash 3 is compact, supports efficient quantization (down to 11GB at 4-bit precision), and employs explicit reasoning tags ("<reasoning>") to indicate its internal thought process. Reka Flash 3 is primarily an English model with limited multilingual understanding capabilities. The model weights are released under the Apache 2.0 license.

Input / 1M tokens
$0.100
Output / 1M tokens
$0.200
Context window
66K tokens
Provider
Rekaai
Knowledge cutoff
2025-01-31

Performance

Median streaming throughput and first-token latency measured by Artificial Analysis.

Output tokens / sec
96 t/s
Time to first token
1.28s

Benchmarks

Intelligence, coding, and math indexes plus the underlying evaluation scores.

Intelligence Index
10
Coding Index
9
Math Index
34
MMLU-Pro
66.9%
GPQA
52.9%
HLE
5.1%
LiveCodeBench
43.5%
SciCode
26.7%
MATH-500
89.3%
AIME
51.0%

Benchmarks via Artificial Analysis