Microsoft Research and Zhejiang University have launched World-R1, a novel method that enhances 3D geometric consistency in text-to-video models using reinforcement learning. This approach does not require changes to model architecture or 3D datasets. World-R1 reconstructs 3D Gaussians from generated videos using the Depth Anything 3 model, rendering scenes from new angles and comparing them to the original. The reinforcement learning algorithm Flow-GRPO is used to adjust the video model based on reconstruction error, trajectory deviation, and semantic plausibility.
The method employs the open-source Wan 2.1 model, with World-R1-Small and World-R1-Large versions showing significant improvements in 3D consistency metrics. Specifically, the Large model improves PSNR by 7.91 dB, while the Small version sees a 10.23 dB increase. In blind tests, World-R1 achieved a 92% win rate for geometric consistency. The project is open-sourced on GitHub under the CC BY-NC-SA 4.0 license.
Microsoft and Zhejiang University Unveil World-R1 for Enhanced 3D Video Consistency
Disclaimer: The content provided on Phemex News is for informational purposes only. We do not guarantee the quality, accuracy, or completeness of the information sourced from third-party articles. The content on this page does not constitute financial or investment advice. We strongly encourage you to conduct you own research and consult with a qualified financial advisor before making any investment decisions.
