Qwen

Qwen-Max

Qwen-Max, based on Qwen2.5, provides the best inference performance among [Qwen models](/qwen), especially for complex multi-step tasks. It's a large-scale MoE model that has been pretrained on over 20 trillion tokens and further post-trained with curated Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) methodologies. The parameter count is unknown.

Input / 1M tokens: $1.04
Output / 1M tokens: $4.16
Context window: 33K tokens
Provider: Qwen
Cached input / 1M: $0.208
Knowledge cutoff: 2025-03-31

Performance

Median streaming throughput and first-token latency measured by Artificial Analysis.

Output tokens / sec: —
Time to first token: —