Alibaba releases closed-source model Qwen3.7-Max; increases investment in reinforcement learning computing power

On May 20, Alibaba’s Tongyi Qianwen team released Qwen3.7-Max, a closed-source flagship model. It scored 56.6 on the Artificial Analysis Intelligence Index and ranked first among Chinese models in multiple benchmarks covering coding, mathematics, and agent tasks. Compared to its predecessor, Qwen3.7-Max achieved nearly a fourfold improvement on the challenging CritPt reasoning benchmark; it also outperformed Gemini 3.5 Flash as well as Claude Opus 4.6/4.7. Its GPQA Diamond score reached 92.3%, SWE-bench Verified coding score stood at 80.4, and HMMT 2026 math score hit 97.1%. The model supports a context window of up to 1 million tokens; tests showed it could run continuously for 35 hours while executing over 1,100 tool calls, making it ideal for enterprise-level long-chain agent scenarios.

Chujie Zheng, a member of the Tongyi Qianwen team, posted on X that the computational resources allocated to Qwen3.7-Max during its reinforcement learning (RL) training phase ‘far exceeded those used in any previous iteration.’ He described this as merely the starting point for scaling up RL efforts, with further advancements planned. Currently, the model is accessible via Alibaba Cloud’s Bailian platform API, though it remains closed-source. This launch coincided with the debut of Alibaba’s self-developed AI chip Zhenwu M890, which boasts 144 GB of VRAM and an inter-chip bandwidth of 800 GB/s.

X @teortaxestex | X @ChujieZheng