Xiaomi AI Lab has unveiled OmniVoice, an open-source voice cloning model capable of supporting 646 languages. This zero-shot text-to-speech (TTS) model can clone voices using just a few seconds of reference audio and generate speech in multiple languages, maintaining the original voice's characteristics. The model's code, weights, and training data are available under the Apache-2.0 license.
OmniVoice features a minimalist architecture with a single bidirectional Transformer that maps text directly to acoustic tokens, bypassing traditional multi-stage processes. It employs innovations like full-codebook random masking and pre-trained parameter initialization to enhance efficiency and pronunciation accuracy. The model, trained on 580,000 hours of open-source data, excels in voice similarity and intelligibility, even for low-resource languages. Additional features include text-based voice customization and automatic noise reduction.
Xiaomi Open-Sources OmniVoice, a 646-Language Voice Cloning Model
Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.
