reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Implicit Safe Set Algorithm for Provably Safe Reinforcement Learning

Authors: Weiye Zhao, Feihan Li, Tairan He, Changliu Liu

JAIR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the proposed algorithm on the state-of-the-art Safety Gym benchmark, where it achieves zero safety violations while gaining 95% 9% cumulative reward compared to state-of-the-art safe DRL methods. Furthermore, the resulting algorithm scales well to high-dimensional systems with parallel computing. ... 8 Experimental Results
Researcher Affiliation	Academia	WEIYE ZHAO, Carnegie Mellon University, United States FEIHAN LI, Carnegie Mellon University, United States TAIRAN HE, Carnegie Mellon University, United States CHANGLIU LIU , Carnegie Mellon University, United States ... EMAIL, EMAIL, EMAIL, EMAIL, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States;
Pseudocode	Yes	Algorithm 1 Adaptive Momentum Boundary Approximation ... Algorithm 2 Implicit Safe Set Algorithm (ISSA) ... Algorithm 3 Convergence Trigger
Open Source Code	Yes	Our code is available on Github.1 1https://github.com/intelligent-control-lab/Implicit_Safe_Set_Algorithm
Open Datasets	Yes	We validate the proposed algorithm on the state-of-the-art Safety Gym benchmark ... We adopt Safety Gym (Ray et al. 2019) as our testing platform to evaluate the effectiveness of the proposed implicit safe set algorithms.
Dataset Splits	No	The average episode return 𝐽𝑟and the average episodic sum of costs 𝑀𝑐were obtained by averaging over the last five epochs of training to reduce noise. Cost rate 𝜌𝑐was just taken from the final epoch. We report the results of these three metrics in Table 4 normalized by PPO results. The paper mentions training and evaluating policies on the Safety Gym benchmark but does not explicitly specify how the dataset within Safety Gym was split into training, validation, or test sets for reproduction purposes.
Hardware Specification	No	Our experiments use Mu Jo Co s unicycle and quadruped models whose low-level velocity controllers meet this reachability property. ... The underlying dynamics of Safety Gym is directly handled by Mu Jo Co physics simulator (Todorov et al. 2012). This indicates the dynamics is not explicitly accessible but rather can be implicitly evaluated, which is suitable for our proposed implicit safe set algorithm. No specific computational hardware (e.g., GPU, CPU models, memory) used for running the experiments is mentioned.
Software Dependencies	No	The paper mentions algorithm names like PPO, PPO-Lagrangian, CPO, and PPO-SL, and the physics simulator Mu Jo Co, but does not provide specific version numbers for any software libraries, frameworks, or environments used. For example, it lists
Experiment Setup	Yes	Table 2. Important hyper-parameters of PPO, PPO-Lagrangian, CPO, PPO-SL and PPO-ISSA: Timesteps per iteration 30000, Policy network hidden layers (256, 256), Value network hidden layers (256, 256), Policy learning rate 0.0004, Value learning rate 0.001, Target KL 0.01, Discounted factor 𝛾 0.99, Advantage discounted factor 𝜆 0.97, PPO Clipping 𝜖 0.2, TRPO Conjugate gradient damping (N/A) 0.1, TRPO Backtracking steps (N/A) 10, Cost limit (N/A) 0. Safety Index Parameter Constraint size = 0.05 Constraint size = 0.15 n 1 1 k 0.375 0.5 𝜂 0 0