Risk-Sensitive Variational Actor-Critic: A Model-Based Approach

Authors: Alonso Granados, Mohammadreza Ebrahimi, Jason Pacheco

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that this approach produces risk-sensitive policies and yields improvements in both tabular and risk-aware variants of complex continuous control tasks in Mu Jo Co. We evaluate the ability of rs VAC to learn risk-sensitive policies in a variety of risky environments. First, we consider a risky variation of the tabular environment... We next evaluate the inclusion of function approximators in a continuous 2D environment... Finally, we compare rs VAC to risk-sensitive baseline methods in variations of three challenging Mu Jo Co environments.
Researcher Affiliation Academia Alonso Granados Department of Computer Science University of Arizona Tucson, AZ, USA EMAIL Reza Ebrahimi School of Information Systems and Management University of South Florida Tampa, FL, USA EMAIL Jason Pacheco Department of Computer Science University of Arizona Tucson, AZ, USA EMAIL
Pseudocode Yes Pseudocode for the rs VAC algorithm can be found in Appendix F. Algorithm 1 rs VAC
Open Source Code Yes Code is available at https://github.com/Alonso Granados/rs VAC/.
Open Datasets Yes We consider a risky variation of the tabular environment discussed in Eysenbach et al. (2022)... We use the Mu Jo Co physics engine (Todorov et al., 2012) in Gymnasium (Towers et al., 2023) to evaluate our method on three continuous tasks (Inverted Pendulum, Half Cheetah, and Swimmer).
Dataset Splits No The paper describes how data is generated through interaction with environments (e.g., sampling 1000 episodes, 10 random trials, 20 episodes per evaluation) rather than using predefined static dataset splits.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) are mentioned for the experimental setup.
Software Dependencies No The paper mentions various algorithms and frameworks like TD3, SAC, PPO, Mu Jo Co, and Gymnasium, but it does not specify any version numbers for these or other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes Table 1: Hyperparameters for stochastic continuous 2D environment (Discount factor 0.9, Soft target update 0.005, Learning rate 0.0003, MLP with 2 hidden layers of size 256). Table 2: Hyperparameters for risk-aware Mu Jo Co benchmark (Discount factor 0.99, Soft target update 0.005, Learning rate 0.0003, MLP with 2 hidden layers of size 256, β initialization -1).