Alibaba's Qianwen has unveiled its latest full-modal large-scale model, Qwen3.5-Omni. This advanced model series includes Instruct versions in Plus, Flash, and Light sizes, featuring a 256k long context capability. It supports over 10 hours of audio input and more than 400 seconds of 720P (1FPS) audio/video input. Pre-trained on extensive text, visual, and over 100 million hours of audio/video data, Qwen3.5-Omni excels in full-modal perception and generation.
The Qwen3.5-Omni model significantly improves upon its predecessor, Qwen3-Omni, by enhancing multilingual capabilities. It now supports speech recognition for 113 languages and dialects, and speech generation for 36 languages and dialects, marking a substantial advancement in Alibaba's AI technology offerings.
Alibaba Launches Qwen3.5-Omni Full-Modal Large Model
Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.
