ByteDance Research has open-sourced Lance, a 3-billion-parameter multimodal model designed for image and video processing. Trained on 128 A100 GPUs, Lance supports simultaneous understanding, generation, and editing within a single framework. Unlike other models that scale up parameter size, Lance employs a dual-stream Mixture-of-Experts architecture and modal-aware rotary positional encoding to manage computational efficiency and reduce signal interference. Despite its lightweight design, Lance excels in benchmark tests for image and video generation and editing, demonstrating a cost-effective approach to balancing generation capability with semantic understanding. This development highlights ByteDance's innovative strategy in multimodal AI, offering a low-compute solution that maintains high performance.