Tilde Research has unveiled Aurora, a new optimizer that significantly enhances training efficiency by addressing a critical flaw in Muon, a widely used optimizer in models like DeepSeek V4 and GLM-5. Muon was found to cause over 25% of neurons in MLP layers to become inactive during early training. Aurora reduces this issue by ensuring uniform updates and maintaining orthogonality, leading to a 100-fold increase in training efficiency.
Aurora's innovative approach allows it to replace Muon with only a 6% increase in computational overhead, without the need for tuning. In benchmark tests, Aurora achieved a new state-of-the-art record, demonstrating its effectiveness in improving model performance. The optimizer and a 1.1B pre-trained model have been open-sourced, offering the community access to these advancements.
Aurora Optimizer Boosts Training Efficiency, Reduces Dead Neurons in Muon
Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.
