reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Counterfactual Explanations for Continuous Action Reinforcement Learning

Authors: Shuyang Dong, Shangtong Zhang, Lu Feng

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluations in two RL domains, Diabetes Control and Lunar Lander, demonstrate the effectiveness, efficiency, and generalization of our approach, enabling more interpretable and trustworthy RL applications. Experimental results demonstrate the effectiveness, efficiency, and generalization of our approach, paving the way for more interpretable and trustworthy RL applications in high-stakes settings.
Researcher Affiliation	Academia	Shuyang Dong , Shangtong Zhang and Lu Feng University of Virginia EMAIL
Pseudocode	Yes	Algorithm 1: Counterfactual Generation
Open Source Code	Yes	Our implementation1 is based on Stable-Baselines3 [Raffin et al., 2021]. 1Code is available at: https://github.com/safe-autonomy-lab/Counterfactual RL
Open Datasets	Yes	We implemented the proposed approach and evaluated it in two RL domains: (i) diabetes control using the FDA-approved UVA/PADOVA simulator [Man et al., 2014], and (ii) Lunar Lander from Open AI Gym [Brockman, 2016].
Dataset Splits	Yes	Both settings included 18 unique trajectories in each training and test set. Both settings included 12 randomly sampled trajectories in each training and test set.
Hardware Specification	No	No specific hardware details (like GPU models, CPU types, or memory amounts) are provided in the paper.
Software Dependencies	No	Our implementation1 is based on Stable-Baselines3 [Raffin et al., 2021]. While Stable-Baselines3 is mentioned, specific version numbers for it or other software dependencies like Python, PyTorch, or CUDA are not provided in the text.
Experiment Setup	Yes	In the single-environment setting, a baseline policy was trained on a chosen patient profile for 100,000 steps, with a learning rate of 0.0001 and a gradient step size of 50. ... After a warm-up phase, batches of 256 trajectories were sampled for model updates using a learning rate of 0.0001 and 50 gradient steps. ... In the single-environment setting, a baseline policy was trained for 3,000 steps with a learning rate of 0.0001 and a gradient step size of 20. ... The proposed approach was used to generate counterfactual trajectories with a learning rate of 0.00001 and a gradient step size of 20