EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Authors: Theodoros Kouzelis, Ioannis Kakogeorgiou, Spyros Gidaris, Nikos Komodakis
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We finetune all autoencoders on Open Images to adhere to the framework used in LDM (Rombach et al., 2022). We finetune for 5 epochs with batch size 10. Detailed specifications of each autoencoder, including spatial compression rates and latent channels, are provided in Appendix E. For Di T (Peebles & Xie, 2023), Si T (Ma et al., 2024) and REPA (Yu et al., 2025), we follow their default settings and train on Image Net (Deng et al., 2009) with a batch size of 256, where each image is resized to 256 256. Evaluation For generative performance, we train latent generative models on the latent distribution of each autoencoder and we report Frechet Inception Distance (FID) (Heusel et al., 2017), s FID (Nash et al., 2021), Inception Score (IS) (Salimans et al., 2016), Precision (Pre.) and Recall (Rec.) (Kynk a anniemi et al., 2019) using 50, 000 samples and following ADM evaluation protocol (Dhariwal & Nichol, 2021). |
| Researcher Affiliation | Collaboration | 1Archimedes,Athena Reaserch Center, Greece 2National Technical University of Athens, Greece 3valeo.ai, France 4University of Crete, Greece 5IACM-Forth, Greece. Correspondence to: Theodoros Kouzelis <EMAIL>. |
| Pseudocode | No | The paper describes the methodology using mathematical equations and descriptive text in Section 3.3 'EQ-VAE: Regularization via equivariance constraints', but does not present any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps. |
| Open Source Code | Yes | Project page and code: https://eq-vae.github.io/. |
| Open Datasets | Yes | We finetune all autoencoders on Open Images to adhere to the framework used in LDM (Rombach et al., 2022). ... For Di T (Peebles & Xie, 2023), Si T (Ma et al., 2024) and REPA (Yu et al., 2025), we follow their default settings and train on Image Net (Deng et al., 2009) |
| Dataset Splits | Yes | For Di T (Peebles & Xie, 2023), Si T (Ma et al., 2024) and REPA (Yu et al., 2025), we follow their default settings and train on Image Net (Deng et al., 2009) with a batch size of 256... To evaluate reconstruction, we report FID, Peak Signal-to-Noise Ratio (PSNR), Structural Similarity (SSIM) (Wang et al., 2004), and Perceptual Similarity (LPIPS) (Zhang et al., 2018) using the Image Net validation set. |
| Hardware Specification | Yes | We use NVIDIA A100 GPUs for our evaluation. |
| Software Dependencies | No | The paper mentions using 'Py Torch' for Mask GIT reproduction and 'DADAPY' for intrinsic dimension estimation, but specific version numbers for these software dependencies are not provided in the text. |
| Experiment Setup | Yes | We finetune for 5 epochs with batch size 10. ... For Di T (Peebles & Xie, 2023), Si T (Ma et al., 2024) and REPA (Yu et al., 2025), we follow their default settings and train on Image Net (Deng et al., 2009) with a batch size of 256, where each image is resized to 256 256. ... By default we set pα = 0.5. |