ZCube, a collaborative effort by Zhipu, Yuxun Network, and Tsinghua University, has introduced a novel networking architecture to address congestion in large model inference deployments. Implemented in the GLM-5.1 coding production environment with a thousand GPUs, ZCube's architecture eliminates traditional Spine layer switches, adopting a fully flattened topology with a 2-hop network diameter. This design, coupled with a hybrid access mechanism, ensures balanced traffic load across all network switches.
Benchmark tests reveal that ZCube reduces hardware costs by 33% and boosts average GPU inference throughput by 15%, while significantly cutting the P99 first-token latency by 40.6%. These improvements highlight ZCube's potential to enhance performance and cost-efficiency in large-scale AI model deployments.
ZCube Network Architecture Enhances Large Model Inference Efficiency
Aviso legal: El contenido de Phemex News es únicamente informativo.No garantizamos la calidad, precisión ni integridad de la información procedente de artículos de terceros.El contenido de esta página no constituye asesoramiento financiero ni de inversión.Le recomendamos encarecidamente que realice su propia investigación y consulte con un asesor financiero cualificado antes de tomar cualquier decisión de inversión.
