Forward KL Regularized Preference Optimization for Aligning Diffusion Policies

Authors: Zhao Shan, Chenyou Fan, Shuang Qiu, Jiyuan Shi, Chenjia Bai

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments for Meta World manipulation and D4RL tasks. The results show our method exhibits superior alignment with preferences and outperforms previous state-of-the-art algorithms. The results of FKPD and competitors for Meta World are presented in Table 1. The results of FKPD and competitors for D4RL are demonstrated in Table 2.
Researcher Affiliation Collaboration 1Institute of Artificial Intelligence (Tele AI), China Telecom 2Tsinghua University 3Northwestern Polytechnical University Xi an 4Hong Kong University of Science and Technology 5Shen Zhen Research Institute of Northwestern Polytechnical University
Pseudocode No The paper describes the methodology in detail using mathematical formulations and descriptive text, but it does not include a clearly labeled pseudocode or algorithm block.
Open Source Code No The paper does not contain any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets Yes We evaluate the performance of FKPD on two well-known benchmarks: Meta World robotics tasks (Yu et al. 2020) and D4RL locomotion (Fu et al. 2020) tasks.
Dataset Splits No The paper describes how the preference dataset is collected (e.g., 'uniformly sampling segments of length 64'), and mentions different types of preference data ('2.5k Dense and 20k sparse'), but it does not explicitly provide information on how these datasets are split into training, validation, or test sets for experimental evaluation in the main text.
Hardware Specification No The paper does not provide specific details about the hardware used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers that would be needed to replicate the experiment.
Experiment Setup No The paper states, 'The architecture of the policy network, along with evaluation details and hyperparameter settings, are provided in Appendix 3-4.' This indicates that these details are not present in the main text.