Alibaba International Digital Commerce Group has open-sourced its latest multimodal large model, Ovis2.6-80B-A3B, featuring 80 billion parameters. This model introduces a Mixture-of-Experts (MoE) architecture, activating only 3 billion parameters per inference to optimize cost-efficiency. A key innovation is the "Think with Image" mechanism, allowing the model to actively use visual tools like cropping and rotating for enhanced image-based reasoning, simulating human-like analysis. Ovis2.6 expands its context window to 64K tokens and supports high-resolution images up to 2880×2880, improving performance on complex visual tasks. Enhanced with optical character recognition and chart analysis, it can efficiently process multi-page documents and detailed queries. This development marks a significant step in balancing cognitive capacity with operational cost control, making it ideal for handling information-intensive tasks such as financial statements and research reports.