Efficiently Access Diffusion Fisher: Within the Outer Product Span Space

Authors: Fangyikang Wang, Hubery Yin, Shaobin Zhuang, Huminhao Zhu, Yinan Li, Lei Qian, Chao Zhang, Hanbin Zhao, Hui Qian, Chen Li

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments in likelihood evaluation and adjoint optimization demonstrate the superior accuracy and reduced computational cost of our proposed algorithms. Additionally, based on the novel outer-product formulation of DF, we design the first numerical verification experiment for the optimal transport property of the general PF-ODE deduced map.
Researcher Affiliation Collaboration 1Zhejiang University. 2We Chat Vision, Tencent Inc. 3Shanghai Jiao Tong University.
Pseudocode Yes Algorithm 1 Training of DF-TM Network Algorithm 2 Numerical OT test for PF-ODE map Algorithm 3 Detailed numerical OT test for PF-ODE map
Open Source Code Yes The code is available at https://github.com/zituitui/Diffusion Fisher.
Open Datasets Yes In commercial-level experiments, we trained two DF-TM networks for the SD-1.5 and SD-2base pipeline on the Laion2B-en dataset (Schuhmann et al., 2022), which contains 2.32 billion text-image pairs. In Figure 1b, we evaluate the average NLL and Clip score of samples generated by various SD models, using 10k randomly selected prompts from the COCO dataset (Lin et al., 2014).
Dataset Splits No The paper mentions evaluating on "10k randomly selected prompts from the COCO dataset" and tuning on "1k COCO prompts", but does not provide specific training/validation/test splits for the main datasets used (e.g., Laion2B-en) that are required to reproduce the data partitioning for model training.
Hardware Specification Yes The training is executed across 8 V100 chips with a batch size of 384 and completed after 150K steps. For experiments on Pick-Score, we use NVIDIA V100 chips, and the rest experiments use Tesla V100 chips.
Software Dependencies No The paper mentions using the "Adam W optimizer (Loshchilov & Hutter, 2019)", but it does not specify version numbers for this or any other software components like programming languages, frameworks, or libraries (e.g., Python, PyTorch, CUDA).
Experiment Setup Yes We utilize the Adam W optimizer (Loshchilov & Hutter, 2019) with a learning rate of 1e-4. The training is executed across 8 V100 chips with a batch size of 384 and completed after 150K steps. We evaluate the NLL across 10 steps throughout the timeline of the PF-ODE. For all experiments, we set the number of sampling steps to T = 50. Adjoint guidance is applied starting from steps ranging from 15 to 35 and ending at step 35, with one guidance per step. Thus the only parameter we tune is the guidance strength. We determine this value for the VJP method via a grid search from 0.1 to 0.5 with a step size of 0.1 and find that the optimal guidance strength for VJP is 0.2.