ByteDance Research has open-sourced Lance, a 3-billion-parameter multimodal model designed for image and video processing. Trained on 128 A100 GPUs, Lance supports simultaneous understanding, generation, and editing within a single framework. Unlike other models that scale up parameter size, Lance employs a dual-stream Mixture-of-Experts architecture and modal-aware rotary positional encoding to manage computational efficiency and reduce signal interference.
Despite its lightweight design, Lance excels in benchmark tests for image and video generation and editing, demonstrating a cost-effective approach to balancing generation capability with semantic understanding. This development highlights ByteDance's innovative strategy in multimodal AI, offering a low-compute solution that maintains high performance.
ByteDance Open-Sources Lance, a 3B-Parameter Multimodal Model
Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.
