The National University of Singapore, Nanyang Technological University, and the Shanghai Artificial Intelligence Laboratory have released Mega-ASR, an open-source speech recognition model designed to excel in noisy environments. Built on the Qwen3-ASR 1.7B backbone, Mega-ASR improves performance by up to 30% compared to models like Whisper and Gemini 3 Pro. The model is available on GitHub under the Apache-2.0 license.
Mega-ASR was trained using the Voices-in-the-wild-2M dataset, which includes 2.4 million samples and simulates 54 complex acoustic scenarios. The model employs Acoustic-to-Semantic Progressive Supervised Fine-Tuning and Dual-Granularity Word Error Rate-Gated Policy Optimization to enhance semantic recovery and reduce errors. A dynamic routing mechanism ensures optimal performance across varying audio qualities.
Mega-ASR Open-Sourced to Boost Speech Recognition in Noisy Environments
Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.
