Zhipu AI Fixes Critical Bugs in GLM-5 Coding Agent

Zhipu AI has identified and resolved two critical bugs in its GLM-5 series models used in Coding Agent scenarios. These issues, which included garbled text and repetition, were reported by users since March and occurred under high concurrency and long context lengths. The first bug involved a race condition in the PD-separation architecture, where memory was reclaimed prematurely, leading to data overwrites. The second bug was found in the HiCache system, where asynchronous cache offloading lacked synchronization, causing premature data reads. Fixes have significantly reduced anomaly rates and eliminated certain errors. Additionally, Zhipu discovered that the acceptance rate metric for speculative sampling could serve as an anomaly detection signal, allowing for real-time monitoring and automatic retries when issues are detected. Further optimizations in the LayerSplit KV Cache have improved throughput by up to 132% for requests with lengths between 40K and 120K tokens, enhancing performance as context length increases.

You may also like