Nous Research has introduced a new pretraining method for large models, Token Stacking Training (TST), which aims to reduce pretraining time by compressing adjacent tokens into bundles. This method, validated on models with up to 10 billion parameters, accelerates training by 2 to 3 times under the same computational budget. However, controversy arose as TST's mechanism closely resembles a 2024 publication, leading to allegations of plagiarism.
Following the release of their paper, Nous Research acknowledged the similarities to the earlier work, describing it as an "unfortunate case of convergent research." They have committed to updating their paper with appropriate citations to address these concerns. The TST method, while innovative, may face limitations if high-quality text corpora become scarce, due to its data-intensive nature.
Nous Research's Token Stacking Training Method Faces Plagiarism Allegations
Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.
