reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Risk-Sensitive Variational Actor-Critic: A Model-Based Approach

Authors: Alonso Granados, Mohammadreza Ebrahimi, Jason Pacheco

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that this approach produces risk-sensitive policies and yields improvements in both tabular and risk-aware variants of complex continuous control tasks in Mu Jo Co. We evaluate the ability of rs VAC to learn risk-sensitive policies in a variety of risky environments. First, we consider a risky variation of the tabular environment... We next evaluate the inclusion of function approximators in a continuous 2D environment... Finally, we compare rs VAC to risk-sensitive baseline methods in variations of three challenging Mu Jo Co environments.
Researcher Affiliation	Academia	Alonso Granados Department of Computer Science University of Arizona Tucson, AZ, USA EMAIL Reza Ebrahimi School of Information Systems and Management University of South Florida Tampa, FL, USA EMAIL Jason Pacheco Department of Computer Science University of Arizona Tucson, AZ, USA EMAIL
Pseudocode	Yes	Pseudocode for the rs VAC algorithm can be found in Appendix F. Algorithm 1 rs VAC
Open Source Code	Yes	Code is available at https://github.com/Alonso Granados/rs VAC/.
Open Datasets	Yes	We consider a risky variation of the tabular environment discussed in Eysenbach et al. (2022)... We use the Mu Jo Co physics engine (Todorov et al., 2012) in Gymnasium (Towers et al., 2023) to evaluate our method on three continuous tasks (Inverted Pendulum, Half Cheetah, and Swimmer).
Dataset Splits	No	The paper describes how data is generated through interaction with environments (e.g., sampling 1000 episodes, 10 random trials, 20 episodes per evaluation) rather than using predefined static dataset splits.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) are mentioned for the experimental setup.
Software Dependencies	No	The paper mentions various algorithms and frameworks like TD3, SAC, PPO, Mu Jo Co, and Gymnasium, but it does not specify any version numbers for these or other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	Table 1: Hyperparameters for stochastic continuous 2D environment (Discount factor 0.9, Soft target update 0.005, Learning rate 0.0003, MLP with 2 hidden layers of size 256). Table 2: Hyperparameters for risk-aware Mu Jo Co benchmark (Discount factor 0.99, Soft target update 0.005, Learning rate 0.0003, MLP with 2 hidden layers of size 256, β initialization -1).