Alibaba Open-Sources Ovis2.6 with Advanced Image-Based Reasoning

Alibaba International Digital Commerce Group has open-sourced its latest multimodal large model, Ovis2.6-80B-A3B, featuring 80 billion parameters. This model introduces a Mixture-of-Experts (MoE) architecture, activating only 3 billion parameters per inference to optimize cost-efficiency. A key innovation is the "Think with Image" mechanism, allowing the model to actively use visual tools like cropping and rotating for enhanced image-based reasoning, simulating human-like analysis. Ovis2.6 expands its context window to 64K tokens and supports high-resolution images up to 2880×2880, improving performance on complex visual tasks. Enhanced with optical character recognition and chart analysis, it can efficiently process multi-page documents and detailed queries. This development marks a significant step in balancing cognitive capacity with operational cost control, making it ideal for handling information-intensive tasks such as financial statements and research reports.

Quelle: Original anzeigen

Haftungsausschluss: Die auf Phemex News bereitgestellten Inhalte dienen nur zu Informationszwecken.Wir garantieren nicht die Qualität, Genauigkeit oder Vollständigkeit der Informationen aus Drittquellen.Die Inhalte auf dieser Seite stellen keine Finanz- oder Anlageberatung dar.Wir empfehlen dringend, eigene Recherchen durchzuführen und einen qualifizierten Finanzberater zu konsultieren, bevor Sie Anlageentscheidungen treffen.

Das könnte Ihnen auch gefallen