reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Diffusion-RainbowPA: Improvements Integrated Preference Alignment for Diffusion-based Text-to-Image Generation

Authors: Haoyuan Sun, Bin Liang, Bo Xia, Jiaqi Wu, Yifei Zhao, Kai Qin, Yongzhe Chang, Xueqian Wang

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	With comprehensive alignment performance evaluation and comparison, it is demonstrated that Diffusion-Rainbow PA outperforms current state-of-the-art methods. We also conduct ablation studies on the introduced components that reveal incorporation of each has positively enhanced alignment performance.
Researcher Affiliation	Academia	Haoyuan Sun EMAIL Tsinghua University Bin Liang EMAIL University of Technology Sydney Bo Xia EMAIL Tsinghua University Jiaqi Wu EMAIL Tsinghua University Yifei Zhao EMAIL Tsinghua University Kai Qin EMAIL Tsinghua University Yongzhe Chang EMAIL Tsinghua University Xueqian Wang EMAIL Tsinghua University
Pseudocode	No	The paper describes methods using mathematical formulations and prose but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about releasing its own source code or a link to a code repository for the methodology described. It mentions using an "open-source step-aware preference model" but not for their own work.
Open Datasets	Yes	In selecting the training dataset, to ensure a fair comparison, we adopt the same dataset utilized by SPO (Liang et al., 2024), which consists of 4K randomly chosen prompts from the Pick-a-Pic V1 dataset. ... on four zero-shot datasets: {Gen Eval (Ghosh et al., 2024), T2I-Comp Bench++ (Huang et al., 2025), Gen AI-Bench (Li et al., 2024a), and DPG-Bench (Hu et al., 2024)}.
Dataset Splits	No	The paper mentions using a training dataset of 4K randomly chosen prompts from the Pick-a-Pic V1 dataset and evaluating on four zero-shot datasets. However, it does not specify explicit training/validation/test splits (e.g., percentages or exact counts) for these datasets or how they were used to reproduce the data partitioning.
Hardware Specification	Yes	In this study, the experiments are conducted on a machine equipped with 4 NVIDIA A100-PCIE-40GB GPUs. ... on consumer-grade graphics cards, specifically utilizing a machine equipped with 4 NVIDIA Ge Force RTX 3090 GPUs (each with 24GB of memory)
Software Dependencies	No	The paper does not provide specific version numbers for any software libraries, frameworks, or solvers used in the experiments.
Experiment Setup	Yes	Hyperparameters. We simultaneously set all terms in Equation (12) to share β = 10, corresponding to the SPO condition. Based on the tuning results reported in (Sun et al., 2025d), the positive enhancement intensity λ in Equation (11) is set as 100 and the threshold as log 0.9; for the MSPA term, the margin strengthening intensity η is empirically set to 0.5.