reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Provably Efficient Exploration in Inverse Constrained Reinforcement Learning

Authors: Bo Yue, Jian Li, Guiliang Liu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To empirically study how well our method captures the accurate constraint, we conduct evaluations under different environments. The experimental results show that PCSE significantly outperforms other exploration strategies and applies to continuous environments.
Researcher Affiliation	Academia	1School of Data Science, The Chinese University of Hong Kong, Shenzhen 2Stony Brook University, New York. Correspondence to: Guiliang Liu <EMAIL>.
Pseudocode	Yes	Algorithm 1 BEAR and PCSE for ICRL in an unknown environment
Open Source Code	No	The paper states: "Our implementation of code for discrete environments is adapted from (Liu et al., 2023), and for continuous environments, it is adapted from (Lazcano et al., 2024)." While it references external code, it does not provide a specific link or explicit statement that their implementation of the described methodology is open-source or publicly available.
Open Datasets	Yes	Our implementation of code for discrete environments is adapted from (Liu et al., 2023), and for continuous environments, it is adapted from (Lazcano et al., 2024). Lazcano, R., Andreas, K., Tai, J. J., Lee, S. R., and Terry, J. Gymnasium robotics, 2024. URL http://github.com/Farama-Foundation/ Gymnasium-Robotics. Point Maze. In this environment, we create a map of 5m 5m, where the area of each cell is 1m 1m.
Dataset Splits	No	The paper describes generating data through interaction with custom Gridworld and Point Maze environments. While it specifies details about the environments and the number of episodes, it does not mention traditional dataset splits (e.g., train/test/validation percentages or counts) for a pre-collected dataset, as data is collected online during the reinforcement learning process.
Hardware Specification	Yes	We ran experiments on a desktop computer with Intel(R) Core(TM) i5-14400F and NVIDIA Ge Force RTX 2080 Ti.
Software Dependencies	No	The paper mentions adapting code from other works (Liu et al., 2023; Lazcano et al., 2024) and using algorithms like Deep Q Network (DQN) and Proximal Policy Optimization (PPO). However, it does not provide specific version numbers for any software dependencies, such as programming languages, libraries, or frameworks used in their implementation.
Experiment Setup	Yes	In this paper, we create a map with dimensions of 7 7 units and define four distinct settings... The agent starts in the lower left cell (0, 0), and it has 8 actions... The reward in the reward state cell is 1, while all other cells have a 0 reward. The cost in a constraint location is also 1. The game continues until a maximum time step of 50 is reached. We plot the mean and 68% confidence interval (1-sigma error bar) computed with 5 random seeds (123456, 123, 1234, 36, 34) and exploration episodes ne = 1. The ϵ-greedy strategy selects an action based on the ϵ-greedy algorithm, balancing exploration and exploitation with the exploration parameter ϵ = 1/√t. We first train a Deep Q Network (DQN) in advance... For algorithm BEAR, Proximal Policy Optimization (PPO) is utilized to obtain the exploration policy πk. In this environment, we create a map of 5m 5m... The constraint is initially set at the cell centered at ( 1, 0)... The game terminates when a maximum time step of 500 is reached.