EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Authors: Theodoros Kouzelis, Ioannis Kakogeorgiou, Spyros Gidaris, Nikos Komodakis

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We finetune all autoencoders on Open Images to adhere to the framework used in LDM (Rombach et al., 2022). We finetune for 5 epochs with batch size 10. Detailed specifications of each autoencoder, including spatial compression rates and latent channels, are provided in Appendix E. For Di T (Peebles & Xie, 2023), Si T (Ma et al., 2024) and REPA (Yu et al., 2025), we follow their default settings and train on Image Net (Deng et al., 2009) with a batch size of 256, where each image is resized to 256 256. Evaluation For generative performance, we train latent generative models on the latent distribution of each autoencoder and we report Frechet Inception Distance (FID) (Heusel et al., 2017), s FID (Nash et al., 2021), Inception Score (IS) (Salimans et al., 2016), Precision (Pre.) and Recall (Rec.) (Kynk a anniemi et al., 2019) using 50, 000 samples and following ADM evaluation protocol (Dhariwal & Nichol, 2021).
Researcher Affiliation Collaboration 1Archimedes,Athena Reaserch Center, Greece 2National Technical University of Athens, Greece 3valeo.ai, France 4University of Crete, Greece 5IACM-Forth, Greece. Correspondence to: Theodoros Kouzelis <EMAIL>.
Pseudocode No The paper describes the methodology using mathematical equations and descriptive text in Section 3.3 'EQ-VAE: Regularization via equivariance constraints', but does not present any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code Yes Project page and code: https://eq-vae.github.io/.
Open Datasets Yes We finetune all autoencoders on Open Images to adhere to the framework used in LDM (Rombach et al., 2022). ... For Di T (Peebles & Xie, 2023), Si T (Ma et al., 2024) and REPA (Yu et al., 2025), we follow their default settings and train on Image Net (Deng et al., 2009)
Dataset Splits Yes For Di T (Peebles & Xie, 2023), Si T (Ma et al., 2024) and REPA (Yu et al., 2025), we follow their default settings and train on Image Net (Deng et al., 2009) with a batch size of 256... To evaluate reconstruction, we report FID, Peak Signal-to-Noise Ratio (PSNR), Structural Similarity (SSIM) (Wang et al., 2004), and Perceptual Similarity (LPIPS) (Zhang et al., 2018) using the Image Net validation set.
Hardware Specification Yes We use NVIDIA A100 GPUs for our evaluation.
Software Dependencies No The paper mentions using 'Py Torch' for Mask GIT reproduction and 'DADAPY' for intrinsic dimension estimation, but specific version numbers for these software dependencies are not provided in the text.
Experiment Setup Yes We finetune for 5 epochs with batch size 10. ... For Di T (Peebles & Xie, 2023), Si T (Ma et al., 2024) and REPA (Yu et al., 2025), we follow their default settings and train on Image Net (Deng et al., 2009) with a batch size of 256, where each image is resized to 256 256. ... By default we set pα = 0.5.