Hyper-Transforming Latent Diffusion Models

Authors: Ignacio Peis, Batuhan Koyuncu, Isabel Valera, Jes Frellsen

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4. Experiments Our evaluations span multiple domains: (1) natural image datasets, including Celeb A-HQ (Liu et al., 2015) at several resolutions and Image Net (Russakovsky et al., 2015); (2) 3D objects, specifically the Chairs subclass from the Shape Net repository (Chang et al., 2015); and (3) polar climate data, using the ERA5 temperature dataset (Hersbach et al., 2019)... In Figure 5, we provide samples from LDMI trained on Celeb A at 64 64 and 256 256... Additionally, in Table 1, we report FID scores of our samples for Celeb A and Image Net, highlighting the model s performance in terms of image quality and reconstruction accuracy.
Researcher Affiliation Academia 1Department of Applied Mathematics and Computer Science, Technical University of Denmark, Copenhagen, Denmark 2Pioneer Centre for Artificial Intelligence, Copenhagen, Denmark 3Saarland University, Saarbr ucken, Germany 4Zuse School ELIZA. Correspondence to: Ignacio Peis <EMAIL>.
Pseudocode Yes Algorithm 1 Hyper-Transforming LDM and Algorithm 2 Training LDMI are outlined in Appendix B of the paper, detailing the procedural steps for training.
Open Source Code Yes The code for reproducing our experiments can be found at https://github.com/ipeis/LDMI.
Open Datasets Yes Our evaluations span multiple domains: (1) natural image datasets, including Celeb A-HQ (Liu et al., 2015) at several resolutions and Image Net (Russakovsky et al., 2015); (2) 3D objects, specifically the Chairs subclass from the Shape Net repository (Chang et al., 2015)...; and (3) polar climate data, using the ERA5 temperature dataset (Hersbach et al., 2019)...
Dataset Splits No The paper mentions 'original images from the test split' in Section 4.3 and 'Reconstructions of test Celeb A-HQ (256 256) images' in Figure 9, implying a test set is used. However, it does not provide specific percentages, absolute sample counts, or a detailed methodology for how the datasets were split into training, validation, and test sets.
Hardware Specification Yes All models were trained on NVIDIA H100 GPUs.
Software Dependencies No The paper provides extensive details on model architecture and hyperparameters in Table 5 but does not specify any software libraries (e.g., PyTorch, TensorFlow) or their version numbers, which are crucial for reproducibility.
Experiment Setup Yes This section outlines the hyperparameter settings used to train LDMI across datasets... Table 5 summarizes the hyperparameters used in our experiments, covering both stages of the generative framework: (i) the first-stage autoencoder either a VQVAE or β-VAE, depending on the dataset and (ii) the second-stage latent diffusion model. It lists architectural choices such as latent dimensionality, diffusion steps, attention resolutions, and optimization settings (e.g., batch size, learning rate), along with details of the HD decoder, tokenizer, Transformer modules, and INR architecture.