Sakana AI, in collaboration with the University of Tokyo, has launched DiffusionBlocks, a new training framework aimed at reducing GPU memory usage in large model training. Announced at ICLR 2026, DiffusionBlocks divides neural networks into modules, allowing independent training of each module. This approach significantly cuts VRAM consumption by enabling block-wise updates, reducing memory usage to one B-th of the original requirement.
The framework addresses the challenge of VRAM demands in deep models by only loading one block at a time for updates, leaving non-sampled blocks unloaded. Experiments indicate that this method not only lowers VRAM needs but also matches or exceeds traditional training performance in tasks like visual Transformers and text generation. Additionally, DiffusionBlocks offers optimization benefits for recurrent models by simulating a dynamic convergence process, reducing computational costs during training.
Sakana AI Unveils DiffusionBlocks to Slash GPU Memory Usage
Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.
