Microsoft has open-sourced its Lens series, a 3.8 billion parameter text-to-image foundational model, which boasts exceptional training efficiency and performance. The Lens model requires only 19.3% of the computational resources compared to Alibaba's Z-Image, thanks to dual optimizations in data and architecture. The training dataset, Lens-800M, includes 800 million image-text pairs generated by GPT-4.1, with an average prompt length of 109 words.
The Lens series features three weight variants for different deployment needs, with the Lens-Turbo variant achieving ultra-fast inference, generating 1024x1024 images in just 0.84 seconds. The model supports resolutions up to 1440x1440 and various aspect ratios. Microsoft has made the model weights available on Hugging Face under the MIT license, with inference code hosted on GitHub, facilitating access for developers and researchers.
Microsoft Open-Sources Lens, a 3.8B Parameter Text-to-Image Model
免責事項: Phemexニュースで提供されるコンテンツは、あくまで情報提供を目的としたものであり、第三者の記事から取得した情報の正確性・完全性・信頼性について保証するものではありません。本コンテンツは金融または投資の助言を目的としたものではなく、投資に関する最終判断はご自身での調査と、信頼できる専門家への相談を踏まえて行ってください。
