Fudan University and Meituan's LongCat team have released WBench, an open-source benchmark for evaluating video generation models. WBench assesses models based on physical rules, spatiotemporal consistency, and interactive control, featuring 289 test cases and 1,058 interaction rounds. It supports both first- and third-person perspectives and integrates navigation control, agent actions, event editing, and viewpoint switching.
The benchmark uses 22 automated metrics, achieving a Spearman rank correlation of at least 0.94 with human blind-test win rates. Findings indicate that interactive control is largely independent of rendering quality, with camera motion control failing to ensure agent consistency. Open-source models like HY-World 1.5 and Matrix-Game 3.0 excel in navigation but face challenges with agent identity and viewpoint drift. The benchmark highlights the complexity of handling non-rigid agents, such as animals, due to deformation and velocity issues.
Fudan and Meituan LongCat Launch WBench for Video Generation Benchmarking
Haftungsausschluss: Die auf Phemex News bereitgestellten Inhalte dienen nur zu Informationszwecken.Wir garantieren nicht die Qualität, Genauigkeit oder Vollständigkeit der Informationen aus Drittquellen.Die Inhalte auf dieser Seite stellen keine Finanz- oder Anlageberatung dar.Wir empfehlen dringend, eigene Recherchen durchzuführen und einen qualifizierten Finanzberater zu konsultieren, bevor Sie Anlageentscheidungen treffen.
