Perplexity AI Open-Sources pplx-garden for High-Speed Multi-GPU Inference

Perplexity AI has open-sourced pplx-garden, a high-performance inference toolkit designed to enhance multi-GPU operations. Central to this release is fabric-lib, a Rust-based communication library that bypasses NVIDIA's proprietary protocols, allowing developers to run trillion-parameter models efficiently across diverse GPU clusters without costly hardware dependencies. This innovation supports both NVIDIA ConnectX-7 and AWS EFA Ethernet NICs, achieving network bandwidths up to 400 Gbps. The toolkit introduces the ImmCounter synchronization mechanism for efficient data transfer and includes a data distribution algorithm optimized for Mixture-of-Experts models. In practical applications, pplx-garden significantly reduces latency in inference and training processes, completing weight synchronization in just 1.3 seconds. Additionally, the open-sourced pplx-unigram tokenizer cuts CPU usage by up to six times, addressing tokenization bottlenecks effectively.

Source: Show Original

Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.

You may also like