reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

PN-GAIL: Leveraging Non-optimal Information from Imperfect Demonstrations

Authors: Qiang Liu, Huiqiao Fu, Kaiqiang Tang, Chunlin Chen, Daoyi Dong

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that PN-GAIL surpasses conventional baseline methods in dealing with imperfect demonstrations, thereby significantly augmenting the practical utility of imitation learning in real-world contexts. Our codes are available at https://github.com/Qiang Liu T/PN-GAIL. Experiments on six control tasks are conducted to show the efficiency of our method in dealing with imperfect demonstrations compared to baseline methods.
Researcher Affiliation	Academia	Qiang Liu, Huiqiao Fu, Kaiqiang Tang & Chunlin Chen School of Management and Engineering Nanjing University Nanjing, China EMAIL, EMAIL Daoyi Dong The Australian Artificial Intelligence Institute University of Technology Sydney Sydney, Australia EMAIL
Pseudocode	Yes	The pseudocode for the overall algorithm can be found in Appendix A. Algorithm 1 PN-GAIL
Open Source Code	Yes	Our codes are available at https://github.com/Qiang Liu T/PN-GAIL.
Open Datasets	Yes	Task setup We conduct experiments across six environments (Pendulum-v1, Ant-v2, Walker2d-v2, Hopper-v2, Swimmer-v2, and Half Cheetah-v2). ... For the Ant-v2, Walker2d-v2, Hopper-v2, Swimmer-v2, and Half Cheetah-v2 environments, to maintain fairness, we directly utilize the demonstrations and confidence scores provided by the code of 2IWIL.
Dataset Splits	Yes	During the practical experiments across all six environments, 20% of the given demonstrations are randomly selected to be assigned confidence scores, which means that the label ratio is 0.2. ... In our experiments, we use different numbers of Dc + Du for different tasks, and the specific values are shown in Appendix C.1. Table 3 shows the number of confidence data and unlabeled data used for each task...
Hardware Specification	Yes	All of our experiments are run on a single machine with 4 NVIDIA Ge Force RTX 3080 GPUs.
Software Dependencies	No	The paper mentions TRPO, PPO, SAC as RL methods and Adam as an optimizer, but does not provide specific software library versions (e.g., Python, PyTorch versions) for reproducibility.
Experiment Setup	Yes	Table 2: Hyper-parameters settings. Hyper-parameters value. γ 0.995. τ (Generalized Advantage Estimation) 0.97. Batch size 5, 000. Learning rate (Value network) 3e-4. Learning rate (Discriminator) 1e-3. Learning rate (Classifier) 3e-4. Optimizer Adam.