reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improving the Diffusability of Autoencoders

Authors: Ivan Skorokhodov, Sharath Girish, Benran Hu, Willi Menapace, Yanyu Li, Rameen Abdal, Sergey Tulyakov, Aliaksandr Siarohin

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our approach on both image and video autoencoders, including Flux AE (Black Forest Labs, 2023), Cosmos Tokenizer (Agarwal et al., 2025), Cog Video X-AE (Hong et al., 2022), and LTX-AE (Ha Cohen et al., 2024), consistently demonstrating improved LDM performance on Image Net-1K (Deng et al., 2009) 2562, reducing FID by 19% for Di T-XL, and Kinetics-700 (Carreira et al., 2019) 17 2562, reducing FVD by at least 44%.
Researcher Affiliation	Collaboration	1Snap Inc. 2Carnegie Mellon University. Correspondence to: Ivan Skorokhodov <EMAIL>, Aliaksandr Siarohin <EMAIL>.
Pseudocode	No	The paper describes methods using mathematical equations and structured steps in paragraph form, but does not contain explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present procedures formatted like code.
Open Source Code	Yes	The source code is available at https://github.com/ snap-research/diffusability.
Open Datasets	Yes	We validate our approach on both image and video autoencoders... demonstrating improved LDM performance on Image Net-1K (Deng et al., 2009) 2562, reducing FID by 19% for Di T-XL, and Kinetics-700 (Carreira et al., 2019) 17 2562, reducing FVD by at least 44%.
Dataset Splits	Yes	For image models, we use 50,000 samples without any optimization for class balancing. To evaluate autoencoders, we used PSNR, SSIM, LPIPS and FID metrics computed on 512 samples from Image Net and Kinetics-700 for image and video autoencoders, respectively.
Hardware Specification	Yes	Our models were trained in the FSDP (Zhao et al., 2023) framework with the full sharding strategy on a single node of 8 NVidia A100 80GB GPUs or 8 NVidia H100 80GB GPUs (depending on their availability in our computational cluster).
Software Dependencies	No	The paper mentions frameworks and optimizers like 'FSDP (Zhao et al., 2023) framework' and 'Adam W (Loshchilov, 2017) optimizer', but does not provide specific version numbers for programming languages or key software libraries required to replicate the experiments.
Experiment Setup	Yes	All the LDM models are trained for 400k steps with 10k warmup steps of the learning rate from 0 to 0.0003 and then its gradual decay towards 0.00001. We used weight decay of 0.01 and Adam W (Loshchilov, 2017) optimizer with beta coefficients of 0.9 and 0.99. We used gradient clipping with the norm of 16 for all the Di T models. Other hyperparameters for autoencoders training are provided in Table 5.