Alibaba's Qianwen has unveiled its latest full-modal large-scale model, Qwen3.5-Omni. This advanced model series includes Instruct versions in Plus, Flash, and Light sizes, featuring a 256k long context capability. It supports over 10 hours of audio input and more than 400 seconds of 720P (1FPS) audio/video input. Pre-trained on extensive text, visual, and over 100 million hours of audio/video data, Qwen3.5-Omni excels in full-modal perception and generation. The Qwen3.5-Omni model significantly improves upon its predecessor, Qwen3-Omni, by enhancing multilingual capabilities. It now supports speech recognition for 113 languages and dialects, and speech generation for 36 languages and dialects, marking a substantial advancement in Alibaba's AI technology offerings.