Z.ai, a Beijing-based artificial intelligence lab, has released GLM-5.2, an open-weight large language model that nearly matches the performance of proprietary frontier models from Anthropic and OpenAI while being built entirely on non-Nvidia hardware. The model ships under an MIT license with no regional restrictions, offering developers and enterprises a powerful yet unrestricted alternative.
According to multiple benchmarks, GLM-5.2 achieved a FrontierSWE score of 74.4, just 1% behind Claude Opus 4.8's 75.1 and ahead of GPT-5.5's 72.6. On SWE-bench Pro, which tests real-world issue resolution, it scored 62.1, surpassing GPT-5.5 and its own predecessor. Independent evaluator Vals AI reported that GLM-5.2 is the first open-weight model to exceed 30% on ProofBench, only 1 percentage point short of Anthropic’s proprietary systems. The Artificial Analysis Intelligence Index now ranks it as the best open-source model, with a score of 51, outpacing DeepSeek V4 Pro and Kimi K2.6.
The architecture features a Mixture-of-Experts design with roughly 744 billion total parameters and 40 billion active parameters, optimized for long-horizon reasoning and agentic coding tasks. A 1-million-token context window—five times the previous generation—enables multi-file refactors and agentic pipelines in a single call. Unsloth AI released 2-bit quantizations that compress the model from 1.51TB to 238GB, though running it locally still requires at least 256GB of unified memory.
Notably, GLM-5.2 was trained on Huawei Ascend chips without any Nvidia hardware, reflecting China’s push toward self-reliance amid U.S. export restrictions. Emad Mostaque, founder of Stability AI, estimated total training costs at approximately $25 million, with 80% spent on post-training, making it significantly cheaper than comparable proprietary models.
API pricing is set at $1.40 per million input tokens and $4.40 per million output, severely undercutting Claude Opus 4.8’s $5 input and $25 output. The Coding Plan starts at about $18 per month and integrates directly with popular agentic environments.
While benchmarks show convergence with closed models, early practitioner feedback reveals real-world inconsistencies. Some users reported unexpected billing discrepancies and performance gaps in debugging tasks when comparing GLM-5.2 to GPT-5.5. Experts caution that aggregated benchmark scores may not fully capture practical reliability, but the release nonetheless represents a significant step toward narrowing the gap between open-weight and proprietary AI systems.