reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SEE-DPO: Self Entropy Enhanced Direct Preference Optimization

Authors: Shivanshu Shekhar, Shreyas Singh, Tong Zhang

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that integrating human feedback with self-entropy regularization can significantly boost image diversity and specificity, achieving state-of-the-art results on key image generation metrics. ... We carry out empirical studies to demonstrate that this regularization technique encourages broader exploration of the solution space, reducing overfitting and preventing reward hacking. ... Our models, trained using the proposed objective, outperform or are comparable to baseline methods across all quality metrics, as shown in Table 2. ... We also conducted a deeper ablation study using SPO, exploring various values of β and γ. In Fig: 5, the left image shows the results of keeping β fixed at 0.1 while varying γ.
Researcher Affiliation	Collaboration	Shivanshu Shekhar EMAIL Siebel School of Computing and Data Science University of Illinois Urbana-Champaign Shreyas Singh EMAIL Fractal AI Research Tong Zhang EMAIL Siebel School of Computing and Data Science University of Illinois Urbana-Champaign
Pseudocode	No	The paper includes mathematical derivations and formulas but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper states: "For a fair comparison, we used the official implementations of D3PO, Diffusion DPO, and SPO with their default parameters." and mentions "SPO s official Git Hub repository". This refers to code from other works, not the specific implementation of SEE-DPO developed in this paper. There is no explicit statement or link indicating that the authors' own code for SEE-DPO is publicly available.
Open Datasets	Yes	For training, we used 4,000 prompts from the Pick-a-Pic-V1 dataset Kirstain et al. (2023) for SPO and D3PO, following the dataset provided in SPO s official Git Hub repository. For Diffusion-DPO, we used 800,000 prompts from the same dataset. ... Additionally, we conduct a user study similar to Liang et al. (2024). We recruit 10 participants to evaluate images generated by different models based on 300 prompts sampled from Parti Prompts and HPSv2 in a 1:2 ratio.
Dataset Splits	Yes	For training, we used 4,000 prompts from the Pick-a-Pic-V1 dataset Kirstain et al. (2023) for SPO and D3PO, following the dataset provided in SPO s official Git Hub repository. For Diffusion-DPO, we used 800,000 prompts from the same dataset. Each model was trained using the same setup and data splits as specified in their original implementations. ... We report results on the validation_unique split of the Pick-a-Pic V1 dataset, which contains 500 prompts, as shown in Tables 1, 2, 4 and 5.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory) used for running the experiments. It only mentions general training without hardware specifics.
Software Dependencies	No	The paper mentions using "official implementations of D3PO, Diffusion DPO, and SPO" but does not specify version numbers for these or any other software libraries (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	Experimental Setting: For a fair comparison, we used the official implementations of D3PO, Diffusion DPO, and SPO with their default parameters. We trained these models using our proposed regularized loss function, as described in the Methodology section. When applying our method, we treated only γ and β as hyperparameters while keeping all other settings at their default values to ensure a fair evaluation. During inference, we set the guidance scale to 7.5 for consistency across models. ... Model γ β D3PO 5 0.01 Diffusion-DPO 3 4 SPO 3 0.1 Table 3: Hyperparameter values: All the other hyperparameters values were fixed to the original implementation