Qwen
Qwen-Max
Qwen-Max, based on Qwen2.5, provides the best inference performance among [Qwen models](/qwen), especially for complex multi-step tasks. It's a large-scale MoE model that has been pretrained on over 20 trillion tokens and further post-trained with curated Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) methodologies. The parameter count is unknown.
- Input / 1M tokens
- $1.04
- Output / 1M tokens
- $4.16
- Context window
- 33K tokens
- Provider
- Qwen
- Cached input / 1M
- $0.208
- Knowledge cutoff
- 2025-03-31
Performance
Median streaming throughput and first-token latency measured by Artificial Analysis.
- Output tokens / sec
- —
- Time to first token
- —