Zyphra's AMD-Based Diffusion Model Boosts Speed by 7.7x

Zyphra has launched the ZAYA1-8B-Diffusion-Preview, a diffusion language model that marks a significant advancement in the AMD hardware ecosystem. This model, a mixture-of-experts (MoE) diffusion model, is derived from an autoregressive large language model and claims to be the first of its kind within the AMD framework. While similar models have been introduced by other teams, ZAYA1 distinguishes itself by leveraging diffusion architectures to enhance engineering efficiency. The ZAYA1 model addresses the limitations of traditional autoregressive models, which are hindered by sequential token generation and physical speed limits. By adopting the TiDAR approach, ZAYA1 enables parallel denoising of 16 token candidates in a single forward pass, effectively transforming memory bandwidth constraints into compute bottlenecks. Real-world testing indicates that ZAYA1's proprietary CCA attention mechanism, combined with a standard lossless sampler, achieves a 4.6x speedup in token reception. This speedup increases to 7.7x with a mixed logit sampler, significantly reducing costs for latency-intensive large-scale inference tasks.