reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Diverse Policies Recovering via Pointwise Mutual Information Weighted Imitation Learning

Authors: Hanlin Yang, Jian Yao, Weiming Liu, Qing Wang, Hanmin Qin, Kong hansheng, Kirk Tang, Jiechao Xiong, Chao Yu, Kai Li, Junliang Xing, Hongwu Chen, Juchao Zhuo, QIANG FU, Yang Wei, Haobo Fu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide theoretical justifications for our new objective, and extensive empirical evaluations confirm the effectiveness of our method in recovering diverse policies from expert data.
Researcher Affiliation	Collaboration	1 Sun Yat-sen University, Guangzhou, China 2 Tencent AI Lab, Shenzhen, China 3 Institute of Automation, Chinese Academy of Sciences, Beijing, China 4 Tsinghua University, Beijing, China ABSTRACT
Pseudocode	Yes	The pseudo-code for the BC-PMI algorithm is shown in Algorithm 1.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository.
Open Datasets	Yes	The datasets utilized in this study are sourced from Atari-Head (Zhang et al., 2018; 2020), an extensive collection of human game-play data. In this experiment, we validate our method on the dataset of a collection of professional basketball player trajectories (Zhan et al., 2020) with the goal of recovering policies that can generate trajectories with diverse player-movement styles.
Dataset Splits	No	The paper mentions using "offline expert trajectories" for Circle 2D and sourcing data from "Atari-Head" and a "professional basketball player dataset", but does not provide specific details on how these datasets are split into training, validation, or test sets (e.g., percentages, sample counts, or explicit standard splits).
Hardware Specification	Yes	All experiments in this paper are implemented with Py Torch and executed on NVIDIA Tesla T4 GPUs.
Software Dependencies	No	All experiments in this paper are implemented with Py Torch and executed on NVIDIA Tesla T4 GPUs. All the runs in experiments use 5 random seeds.
Experiment Setup	Yes	Table 5: Common hyperparameters setting. Hyperparameter Circle 2D Ms Pacman Space Invaders Basketball learning rate 0.001 0.001 0.001 0.0002 optimizer Adam Adam Adam Adam epoch 10 30 30 30 batch size 128 512 512 128 hidden dim 32 64 64 128