Sakana AI, in collaboration with NVIDIA, has launched TwELL, an open-source sparse data format and acceleration kernels that enhance GPU efficiency by skipping ineffective computations. This innovation increases H100 inference speed by up to 30% and training speed by up to 24%, without compromising model accuracy. TwELL addresses the inefficiency in feedforward network layers of large models, where over 80% of neurons remain inactive during text generation.
TwELL optimizes GPU operations by dividing data into small blocks, allowing GPUs to handle them efficiently and eliminating costly global memory operations. Tests on a 1.5-billion-parameter model showed that only 2% of neurons required computation, maintaining performance across multiple tasks. As models scale, this optimization could yield even greater performance improvements, with larger models showing a significant reduction in active neuron ratios.
Sakana AI and NVIDIA Enhance GPU Efficiency with TwELL, Boosting H100 Inference by 30%
Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.
