reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets

Authors: Zhen Liu, Tim Xiao, Weiyang Liu, Yoshua Bengio, Dinghuai Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Figure 6 and Table 1, we show the evolution of reward, Dream Sim diversity and FID scores of all methods with the mean curves and the corresponding standard deviations (on 3 random seeds). Our proposed residual -DB is able to achieve comparable convergence speed, measured in update steps, to that of the gradient-free baselines...
Researcher Affiliation	Collaboration	1Mila, Université de Montréal 2Max Planck Institute for Intelligent Systems Tübingen 3The Chinese University of Hong Kong (Shenzhen) 4University of Tübingen 5University of Cambridge 6Microsoft Research
Pseudocode	Yes	A OVERALL ALGORITHM Algorithm 1 -GFlow Net Diffusion Finetuning with residual -DB
Open Source Code	Yes	Project page: nabla-gfn.github.io
Open Datasets	Yes	For the main experiments, we consider two reward functions: Aesthetic Score [28], Human Preference Score (HPSv2) [69, 70] and Image Reward [71], all of which trained on large-scale human preference datasets such as LAION-aesthetic [28] and predict the logarithm of reward values.
Dataset Splits	No	No specific dataset splits (training, validation, test) are explicitly provided in terms of percentages or sample counts. The paper describes aspects of data collection and usage during training, such as collecting "64 generation trajectories" and "sub-sample 10% of the transitions" for loss computation, but not a defined split of an overall dataset for reproducibility.
Hardware Specification	Yes	All methods are benchmarked on a single node with 8 80GB-mem A100 GPUs.
Software Dependencies	No	No specific software dependencies with version numbers are provided. The paper mentions using "50-step DDPM sampler [17]" and "Stable Diffusion-v1.5 [51]" as the base model, and "LoRA [21]" for finetuning, but these are models/techniques, not general software dependencies with version numbers like programming languages or libraries.
Experiment Setup	Yes	For all experiments with residual -DB, we set the learning rate to 1e-3 and ablate over a set of choices of reward temperature β... We set the output regularization strength λ = 2000 in Aesthetic Score experiments and λ = 5000 in HPSv2 and Image Reward experiments... For HPSv2 and Image Reward experiments, we set β to be 500000 and 10000, respectively. For each epoch, we collect 64 generation trajectories... We use the number of gradient accumulation steps to 4...