Alibaba Open-Sources Ovis2.6 with Advanced Image-Based Reasoning

Alibaba International Digital Commerce Group has open-sourced its latest multimodal large model, Ovis2.6-80B-A3B, featuring 80 billion parameters. This model introduces a Mixture-of-Experts (MoE) architecture, activating only 3 billion parameters per inference to optimize cost-efficiency. A key innovation is the "Think with Image" mechanism, allowing the model to actively use visual tools like cropping and rotating for enhanced image-based reasoning, simulating human-like analysis. Ovis2.6 expands its context window to 64K tokens and supports high-resolution images up to 2880×2880, improving performance on complex visual tasks. Enhanced with optical character recognition and chart analysis, it can efficiently process multi-page documents and detailed queries. This development marks a significant step in balancing cognitive capacity with operational cost control, making it ideal for handling information-intensive tasks such as financial statements and research reports.

출처: 원본 보기

면책 조항: Phemex 뉴스에서 제공하는 콘텐츠는 정보 제공 목적으로만 제공됩니다. 제3자 기사에서 출처를 얻은 정보의 품질, 정확성 또는 완전성을 보장하지 않습니다.이 페이지의 콘텐츠는 재무 또는 투자 조언이 아닙니다.투자 결정을 내리기 전에 반드시 스스로 조사하고 자격을 갖춘 재무 전문가와 상담하시기 바랍니다.

함께 보면 좋은 콘텐츠