Forward KL Regularized Preference Optimization for Aligning Diffusion Policies
Authors: Zhao Shan, Chenyou Fan, Shuang Qiu, Jiyuan Shi, Chenjia Bai
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments for Meta World manipulation and D4RL tasks. The results show our method exhibits superior alignment with preferences and outperforms previous state-of-the-art algorithms. The results of FKPD and competitors for Meta World are presented in Table 1. The results of FKPD and competitors for D4RL are demonstrated in Table 2. |
| Researcher Affiliation | Collaboration | 1Institute of Artificial Intelligence (Tele AI), China Telecom 2Tsinghua University 3Northwestern Polytechnical University Xi an 4Hong Kong University of Science and Technology 5Shen Zhen Research Institute of Northwestern Polytechnical University |
| Pseudocode | No | The paper describes the methodology in detail using mathematical formulations and descriptive text, but it does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or provide a link to a code repository. |
| Open Datasets | Yes | We evaluate the performance of FKPD on two well-known benchmarks: Meta World robotics tasks (Yu et al. 2020) and D4RL locomotion (Fu et al. 2020) tasks. |
| Dataset Splits | No | The paper describes how the preference dataset is collected (e.g., 'uniformly sampling segments of length 64'), and mentions different types of preference data ('2.5k Dense and 20k sparse'), but it does not explicitly provide information on how these datasets are split into training, validation, or test sets for experimental evaluation in the main text. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers that would be needed to replicate the experiment. |
| Experiment Setup | No | The paper states, 'The architecture of the policy network, along with evaluation details and hyperparameter settings, are provided in Appendix 3-4.' This indicates that these details are not present in the main text. |