reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning

Authors: Yarden As, Bhavya, Lenart Treven, Carmelo Sferrazza, Stelian Coros, Andreas Krause

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically show that ACTSAFE obtains state-of-the-art performance in difficult exploration tasks on standard safe deep RL benchmarks while ensuring safety during learning.
Researcher Affiliation	Academia	Yarden As, Bhavya Sukhija ETH Z urich Lenart Treven ETH Z urich Carmelo Sferrazza UC Berkeley Stelian Coros ETH Z urich Andreas Krause ETH Z urich
Pseudocode	Yes	Algorithm 1 ACTSAFE: ACTIVE EXPLORATION WITH SAFETY CONSTRAINTS (Expansion stage) Init: Aleatoric uncertainty σ, Probability δ, Statistical model (µ0, σ0, β0(δ)) for episode n = 1, . . . , n do πn = arg maxπ Sn maxf Mn Eτ π,f h PT 1 t=0 σn 1(ˆst, π(ˆst)) i Prepare policy Dn ROLLOUT(πn) Collect data Update (Mn, Sn) D1:n Update statistical model and safe set end for
Open Source Code	Yes	We provide an open-source implementation of our experiments in https://github.com/ yardenas/actsafe.
Open Datasets	Yes	Additionally, we show that ACTSAFE scales to high-dimensional environments of the SAFETY-GYM and RWRL benchmarks, excelling in challenging exploration tasks with visual control while also incurring significantly fewer constraint violations.
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits. Instead, it describes a warm-up period for data collection in an RL setting and mentions sampling episodes for evaluation, which is characteristic of online reinforcement learning rather than predefined dataset splits.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU or CPU models used for running its experiments.
Software Dependencies	No	The paper mentions several frameworks and tools used, such as Dreamer (Hafner et al., 2023), Recurrent State Space Model (RSSM) from Hafner et al. (2019), Log-Barrier SGD (LBSGD, Usmanova et al., 2024), and i CEM (Pinneri et al., 2021). However, it does not specify version numbers for general software dependencies like Python, PyTorch, TensorFlow, or other core libraries.
Experiment Setup	Yes	For the state-based tasks, we use GPs to model the dynamics f . For the visual control tasks, we use the RSSM model from Hafner et al. (2019) as described in Section 4.3. We thus validate both the theoretical and practical aspects of ACTSAFE in this section. For both environments, we run the algorithms for ten episodes and then use the learned model to plan w.r.t. known extrinsic rewards after the expansion phase. We assume access to an initial data collection (warm-up) period of 200K environment steps, where the agent collects data and uses it to calibrate its world model. We use the same training procedure across all baselines and environments. We set the cost budget for each episode to d = 25 for SAFETY-GYM. Unless specified otherwise, in all our experiments we use 5 random seeds and report the median and standard error across these seeds. Finally, we use a budget of 5M training steps for each training run. For CARTPOLESWINGUPSPARSE, we use a cost budget of d = 100 and an episode length of T = 1000 steps. In our experiments we treat lambda as a hyper-parameter.