Adobe Open-Sources RAEv2 with 10x Faster Convergence

Adobe Research, in collaboration with the Australian National University and New York University, has open-sourced the second-generation Representation Autoencoder, RAEv2. This new model, led by Xie Saining, significantly improves upon its predecessor by addressing key limitations such as poor reconstruction quality and slow convergence. RAEv2, a diffusion model-based alternative to traditional Variational Autoencoders, achieves a global FID of 1.06 on ImageNet in just 80 training steps, marking a tenfold increase in convergence speed. The RAEv2 architecture introduces three core optimizations, including a multi-layer representation approach that enhances reconstruction quality and compression efficiency. Additionally, the model integrates Representation Alignment (REPA) to improve spatial detail capture, allowing stronger encoders like DINOv3 to excel in generative tasks. The new architecture also reformulates the diffusion model's output, enabling "free" internal guidance without extra training costs. RAEv2 outperforms previous models in various metrics and demonstrates strong generalization in tasks such as text-to-image generation and video synthesis.

You may also like