reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement Learning

Authors: Prajwal Koirala, Zhanhong Jiang, Soumik Sarkar, Cody Fleming

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive empirical evaluation on benchmark datasets, including challenging autonomous driving scenarios, demonstrates that our approach not only maintains safety compliance but also excels in cumulative reward optimization, surpassing existing methods. Additional visualizations provide further insights into the effectiveness and underlying mechanisms of our approach.
Researcher Affiliation	Academia	Prajwal Koirala, Zhanhong Jiang, Soumik Sarkar & Cody Fleming Iowa State University Ames, Iowa, USA EMAIL
Pseudocode	Yes	Algorithm 1: LSPC Training
Open Source Code	Yes	The code is available here.
Open Datasets	Yes	Our evaluation uses the DSRL benchmark (Liu et al., 2023a), focusing on normalized return and normalized cost to measure performance.
Dataset Splits	No	The paper mentions evaluating methods on each dataset with three distinct target cost thresholds and across three random seeds, and in transfer experiments it is "evaluated across 20 episodes". However, it does not provide specific training/test/validation dataset splits (e.g., percentages, sample counts, or explicit standard split references) for the underlying pre-collected datasets.
Hardware Specification	Yes	The device used for reporting the training times in this section is a Dell Alienware Aurora R12 system with an 11th gen Intel Core i7 processor, 32 GB DDR4, and an NVIDIA Ge Force RTX 3070 8GB GPU. All experiments were run on a CUDA device.
Software Dependencies	No	The paper mentions using "CORL (Tarasov et al., 2024) implementation for Implicit Q-Learning (IQL)" and a "codebase inspired by the OSRL (Liu et al., 2023a) style", along with "Py Bullet physics simulator" and "Mu Jo Co physics simulator", but does not provide specific version numbers for any of these software components.
Experiment Setup	Yes	Table 2: Common Hyperparameters for IQL and AWR Hyperparameter Value Batch size (\|B\|) 1024 Discount factor (γ) 0.99 Soft update rate for Q-networks (T ) 0.005 Inverse temperature for reward 2.0 Inverse temperature for cost 2.0 Learning rates for all parameters 3 10 4 Asymmetric L2 loss coefficient (ξ) 0.7 Max exp advantage weight (both cost and reward) 200.0