Diverse Policies Recovering via Pointwise Mutual Information Weighted Imitation Learning
Authors: Hanlin Yang, Jian Yao, Weiming Liu, Qing Wang, Hanmin Qin, Kong hansheng, Kirk Tang, Jiechao Xiong, Chao Yu, Kai Li, Junliang Xing, Hongwu Chen, Juchao Zhuo, QIANG FU, Yang Wei, Haobo Fu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide theoretical justifications for our new objective, and extensive empirical evaluations confirm the effectiveness of our method in recovering diverse policies from expert data. |
| Researcher Affiliation | Collaboration | 1 Sun Yat-sen University, Guangzhou, China 2 Tencent AI Lab, Shenzhen, China 3 Institute of Automation, Chinese Academy of Sciences, Beijing, China 4 Tsinghua University, Beijing, China ABSTRACT |
| Pseudocode | Yes | The pseudo-code for the BC-PMI algorithm is shown in Algorithm 1. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository. |
| Open Datasets | Yes | The datasets utilized in this study are sourced from Atari-Head (Zhang et al., 2018; 2020), an extensive collection of human game-play data. In this experiment, we validate our method on the dataset of a collection of professional basketball player trajectories (Zhan et al., 2020) with the goal of recovering policies that can generate trajectories with diverse player-movement styles. |
| Dataset Splits | No | The paper mentions using "offline expert trajectories" for Circle 2D and sourcing data from "Atari-Head" and a "professional basketball player dataset", but does not provide specific details on how these datasets are split into training, validation, or test sets (e.g., percentages, sample counts, or explicit standard splits). |
| Hardware Specification | Yes | All experiments in this paper are implemented with Py Torch and executed on NVIDIA Tesla T4 GPUs. |
| Software Dependencies | No | All experiments in this paper are implemented with Py Torch and executed on NVIDIA Tesla T4 GPUs. All the runs in experiments use 5 random seeds. |
| Experiment Setup | Yes | Table 5: Common hyperparameters setting. Hyperparameter Circle 2D Ms Pacman Space Invaders Basketball learning rate 0.001 0.001 0.001 0.0002 optimizer Adam Adam Adam Adam epoch 10 30 30 30 batch size 128 512 512 128 hidden dim 32 64 64 128 |