reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficiently Access Diffusion Fisher: Within the Outer Product Span Space

Authors: Fangyikang Wang, Hubery Yin, Shaobin Zhuang, Huminhao Zhu, Yinan Li, Lei Qian, Chao Zhang, Hanbin Zhao, Hui Qian, Chen Li

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments in likelihood evaluation and adjoint optimization demonstrate the superior accuracy and reduced computational cost of our proposed algorithms. Additionally, based on the novel outer-product formulation of DF, we design the first numerical verification experiment for the optimal transport property of the general PF-ODE deduced map.
Researcher Affiliation	Collaboration	1Zhejiang University. 2We Chat Vision, Tencent Inc. 3Shanghai Jiao Tong University.
Pseudocode	Yes	Algorithm 1 Training of DF-TM Network Algorithm 2 Numerical OT test for PF-ODE map Algorithm 3 Detailed numerical OT test for PF-ODE map
Open Source Code	Yes	The code is available at https://github.com/zituitui/Diffusion Fisher.
Open Datasets	Yes	In commercial-level experiments, we trained two DF-TM networks for the SD-1.5 and SD-2base pipeline on the Laion2B-en dataset (Schuhmann et al., 2022), which contains 2.32 billion text-image pairs. In Figure 1b, we evaluate the average NLL and Clip score of samples generated by various SD models, using 10k randomly selected prompts from the COCO dataset (Lin et al., 2014).
Dataset Splits	No	The paper mentions evaluating on "10k randomly selected prompts from the COCO dataset" and tuning on "1k COCO prompts", but does not provide specific training/validation/test splits for the main datasets used (e.g., Laion2B-en) that are required to reproduce the data partitioning for model training.
Hardware Specification	Yes	The training is executed across 8 V100 chips with a batch size of 384 and completed after 150K steps. For experiments on Pick-Score, we use NVIDIA V100 chips, and the rest experiments use Tesla V100 chips.
Software Dependencies	No	The paper mentions using the "Adam W optimizer (Loshchilov & Hutter, 2019)", but it does not specify version numbers for this or any other software components like programming languages, frameworks, or libraries (e.g., Python, PyTorch, CUDA).
Experiment Setup	Yes	We utilize the Adam W optimizer (Loshchilov & Hutter, 2019) with a learning rate of 1e-4. The training is executed across 8 V100 chips with a batch size of 384 and completed after 150K steps. We evaluate the NLL across 10 steps throughout the timeline of the PF-ODE. For all experiments, we set the number of sampling steps to T = 50. Adjoint guidance is applied starting from steps ranging from 15 to 35 and ending at step 35, with one guidance per step. Thus the only parameter we tune is the guidance strength. We determine this value for the VJP method via a grid search from 0.1 to 0.5 with a step size of 0.1 and find that the optimal guidance strength for VJP is 0.2.