reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Diversifying Policy Behaviors with Extrinsic Behavioral Curiosity

Authors: Zhenglin Wan, Xingrui Yu, David Mark Bossens, Yueming Lyu, Qing Guo, Flint Xiaofeng Fan, Yew-Soon Ong, Ivor Tsang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate the effectiveness of EBC in exploring diverse behaviors, we evaluate our method on multiple robot locomotion tasks. EBC improves the performance of QD-IRL instances with GAIL, VAIL, and Diff AIL across all included environments by up to 185%, 42%, and 150%, even surpassing expert performance by 20% in Humanoid. Furthermore, we demonstrate that EBC is applicable to Gradient Arborescence-based Quality Diversity Reinforcement Learning (QD-RL) algorithms, where it substantially improves performance and provides a generic technique for learning behavioral diverse policies.
Researcher Affiliation	Academia	1School of Data Science, The Chinese University of Hong Kong, Shenzhen, China 2CFAR, Agency for Science, Technology and Research, Singapore 3IHPC, Agency for Science, Technology and Research, Singapore 4College of Computing and Data Science, Nanyang Technological University (NTU), Singapore. Correspondence to: Xingrui Yu <EMAIL>.
Pseudocode	Yes	We provide the pseudo-code of the general procedure of QD-IRL with PPGA in Algorithm 1, where different IRL methods differ from the update reward model part and other parts requiring the reward model to calculate learned reward (highlighted in red). Please refer to Appendix B for algorithms for updating the archive (Algorithm 2), updating the reward model, calculating the rewards with the EBC bonus, and computing the gradients for the objective and measures (Algorithm 3).
Open Source Code	Yes	The source code of this work is provided at https://github.com/vanzll/EBC.
Open Datasets	No	We use a policy archive obtained by PPGA to generate expert demonstrations. In line with a real-world scenario with limited demonstrations, we first sample the top 500 high-performance elites from the archive as a candidate pool, and then select a few demonstrations such that they are as diverse as possible. This process results in 4 diverse demonstrations (episodes) per environment. Appendix D shows the statistical properties for selected demonstrations.
Dataset Splits	No	We use a policy archive obtained by PPGA to generate expert demonstrations. In line with a real-world scenario with limited demonstrations, we first sample the top 500 high-performance elites from the archive as a candidate pool, and then select a few demonstrations such that they are as diverse as possible. This process results in 4 diverse demonstrations (episodes) per environment. Appendix D shows the statistical properties for selected demonstrations. The paper does not specify how these demonstrations or any other data were split into training, validation, or test sets.
Hardware Specification	Yes	All Experiments are conducted on a system with four A40 48G GPUs, an AMD EPYC 7543P 32-core CPU, and a Linux OS. Each single experiment only requires one A40 48G GPU and takes roughly two days.
Software Dependencies	No	Our experiments are based on the PPGA implementation using the Brax simulator (Freeman et al., 2021), enhanced with QDax wrappers for measure calculation (Lim et al., 2022). We leverage pyribs (Tjanaka et al., 2023) and Clean RL s PPO (Huang et et al., 2020) for implementing the PPGA algorithm. The paper mentions software components but does not provide specific version numbers for them.
Experiment Setup	Yes	Appendix E. Hyperparameter Setting Table 3: List of relevant hyperparameters for PPGA shared across all environments. Table 4: List of relevant hyperparameters for AIRL, GAILs shared across all environments. Table 5: List of relevant hyperparameters for VAILs shared across all environments. Table 6: List of relevant hyperparameters for GIRIL shared across all environments. Table 7: List of relevant hyperparameters for Diff AIL and Diff AIL-EBC.