Alibaba Open-Sources Ovis2.6 with Advanced Image-Based Reasoning

Alibaba International Digital Commerce Group has open-sourced its latest multimodal large model, Ovis2.6-80B-A3B, featuring 80 billion parameters. This model introduces a Mixture-of-Experts (MoE) architecture, activating only 3 billion parameters per inference to optimize cost-efficiency. A key innovation is the "Think with Image" mechanism, allowing the model to actively use visual tools like cropping and rotating for enhanced image-based reasoning, simulating human-like analysis. Ovis2.6 expands its context window to 64K tokens and supports high-resolution images up to 2880×2880, improving performance on complex visual tasks. Enhanced with optical character recognition and chart analysis, it can efficiently process multi-page documents and detailed queries. This development marks a significant step in balancing cognitive capacity with operational cost control, making it ideal for handling information-intensive tasks such as financial statements and research reports.

Nguồn: Hiển thị bản gốc

Tuyên bố miễn trừ trách nhiệm: Nội dung được cung cấp trên Phemex News chỉ nhằm mục đích cung cấp thông tin.Chúng tôi không đảm bảo chất lượng, độ chính xác hoặc tính đầy đủ của thông tin có nguồn từ các bài viết của bên thứ ba.Nội dung trên trang này không cấu thành lời khuyên về tài chính hoặc đầu tư.Chúng tôi đặc biệt khuyến khích bạn tự tiến hành nghiên cứu và tham khảo ý kiến của cố vấn tài chính đủ tiêu chuẩn trước khi đưa ra bất kỳ quyết định đầu tư nào.

Bạn cũng có thể thích