reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Test-time Alignment of Diffusion Models without Reward Over-optimization

Authors: Sunwoo Kim, Minkyu Kim, Dongmin Park

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate its effectiveness in single-reward optimization, multi-objective scenarios, and online black-box optimization. This work offers a robust solution for aligning diffusion models with diverse downstream objectives without compromising their general capabilities. ... We empirically validate DAS s effectiveness across diverse scenarios, including single-reward, multi-objective, and online black-box optimization tasks.
Researcher Affiliation	Collaboration	Sunwoo Kim1 Minkyu Kim2 Dongmin Park2 1 Seoul National University 2 KRAFTON EMAIL EMAIL
Pseudocode	Yes	The pseudo-code of the final algorithm with adaptive resampling is given in Algorithm A.1. ... Detailed pseudocode for our full DAS algorithm is included in Appendix A, with versions with adaptive resampling (Algorithm 1), adaptive tempering (Algorithm 3) and adaptation to online setting (Algorithm 5).
Open Source Code	Yes	Code is available at https://github.com/krafton-ai/DAS.
Open Datasets	Yes	For single reward tasks, we use aesthetic scores (Schuhmann et al., 2022) and human preference evaluated by Pick Score (Kirstain et al., 2023) as objectives. For fine-tuning methods, we used animals from Imagenet Deng et al. (2009) and prompts from Human Preference Dataset v2 (HPDv2) (Wu et al., 2023b) when training on aesthetic score and Pick Score respectively, like previous settings (Black et al., 2023; Clark et al., 2024).
Dataset Splits	No	Evaluation uses unseen prompts from the same dataset. ... We used HPDv2 prompts for training and evaluation.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used (e.g., GPU models, CPU types, or cloud instance specifications) for running its experiments.
Software Dependencies	No	We used official Py Torch codebase of DDPO, Align Prop, TDPO with minimal change of hyperparameters from the settings in the original papers and codebases. We adapted the official Py Torch codebase of Free Do M and MPGD to incorporate with diffusers library.
Experiment Setup	Yes	For fine-tuning methods, we used 200 epoch and effective batch size of 256 using gradient accumulations if need for all methods. ... Across all experiment results except ablation studies, we used 100 diffusion time steps with γ = 0.008 for tempering. ... For single reward experiments, we used KL coefficient α = 0.01 for aesthetic score task and α = 0.0001 for Pick Score task considering the scale of the rewards. For multi-objective experiments and online black-box optimization, we used α = 0.005. We used 16 particles if not explicitly mentioned. ... For pre-training via conditional score matching, we used learning rate 0.001 with 1000 epochs. For DDS, we used learning rate 3e 5 with 300 epochs. We used Adam optimizer for all training or fine-tuning.