Zhipu has introduced the GLM-5.1 High-Speed API, achieving a groundbreaking output speed of 400 tokens per second, marking a new global benchmark for large model interfaces. This high-speed API, available to select enterprise clients, is powered by a high-performance inference engine developed in collaboration with the TileRT team. The engine optimizes GPU scheduling by compiling models into persistent Engine Kernels, significantly reducing latency. In multi-GPU environments, the TileRT system enhances efficiency by specializing GPU nodes in an 8-GPU NVL topology, improving attention layer computations and inter-GPU communication. Zhipu plans to further optimize FP8 inference and extend context capabilities to support low-latency applications such as AI programming and real-time interactions.