Tsinghua's HRM-Text Sets New Efficiency Standard in Model Pr

Tsinghua University alumnus Wang Guan and his team have introduced HRM-Text, a novel pretraining approach that challenges traditional large model paradigms. Utilizing a Hierarchical Recurrent Model (HRM), HRM-Text achieves state-of-the-art performance with significantly reduced computational resources. The model uses 100–900 times fewer training tokens and 96–432 times less compute compared to models with 2B to 7B parameters, while maintaining competitive results on benchmarks such as MMLU and ARC-C. HRM-Text's architecture features a dual time scale model, splitting computation into slow and fast modules, allowing for multiple recursive updates per token. This design, combined with targeted training objectives, enhances pretraining efficiency. The model's training cost is approximately $1,500, demonstrating its cost-effectiveness. Despite its achievements, HRM-Text's developers acknowledge the need for further research to decouple knowledge from reasoning and explore adaptive computation time mechanisms.

You may also like