Xiaomi AI Lab has unveiled OmniVoice, an open-source voice cloning model capable of supporting 646 languages. This zero-shot text-to-speech (TTS) model can clone voices using just a few seconds of reference audio and generate speech in multiple languages, maintaining the original voice's characteristics. The model's code, weights, and training data are available under the Apache-2.0 license. OmniVoice features a minimalist architecture with a single bidirectional Transformer that maps text directly to acoustic tokens, bypassing traditional multi-stage processes. It employs innovations like full-codebook random masking and pre-trained parameter initialization to enhance efficiency and pronunciation accuracy. The model, trained on 580,000 hours of open-source data, excels in voice similarity and intelligibility, even for low-resource languages. Additional features include text-based voice customization and automatic noise reduction.