reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Forward KL Regularized Preference Optimization for Aligning Diffusion Policies

Authors: Zhao Shan, Chenyou Fan, Shuang Qiu, Jiyuan Shi, Chenjia Bai

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments for Meta World manipulation and D4RL tasks. The results show our method exhibits superior alignment with preferences and outperforms previous state-of-the-art algorithms. The results of FKPD and competitors for Meta World are presented in Table 1. The results of FKPD and competitors for D4RL are demonstrated in Table 2.
Researcher Affiliation	Collaboration	1Institute of Artificial Intelligence (Tele AI), China Telecom 2Tsinghua University 3Northwestern Polytechnical University Xi an 4Hong Kong University of Science and Technology 5Shen Zhen Research Institute of Northwestern Polytechnical University
Pseudocode	No	The paper describes the methodology in detail using mathematical formulations and descriptive text, but it does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets	Yes	We evaluate the performance of FKPD on two well-known benchmarks: Meta World robotics tasks (Yu et al. 2020) and D4RL locomotion (Fu et al. 2020) tasks.
Dataset Splits	No	The paper describes how the preference dataset is collected (e.g., 'uniformly sampling segments of length 64'), and mentions different types of preference data ('2.5k Dense and 20k sparse'), but it does not explicitly provide information on how these datasets are split into training, validation, or test sets for experimental evaluation in the main text.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers that would be needed to replicate the experiment.
Experiment Setup	No	The paper states, 'The architecture of the policy network, along with evaluation details and hyperparameter settings, are provided in Appendix 3-4.' This indicates that these details are not present in the main text.