Risk-Sensitive Variational Actor-Critic: A Model-Based Approach
Authors: Alonso Granados, Mohammadreza Ebrahimi, Jason Pacheco
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that this approach produces risk-sensitive policies and yields improvements in both tabular and risk-aware variants of complex continuous control tasks in Mu Jo Co. We evaluate the ability of rs VAC to learn risk-sensitive policies in a variety of risky environments. First, we consider a risky variation of the tabular environment... We next evaluate the inclusion of function approximators in a continuous 2D environment... Finally, we compare rs VAC to risk-sensitive baseline methods in variations of three challenging Mu Jo Co environments. |
| Researcher Affiliation | Academia | Alonso Granados Department of Computer Science University of Arizona Tucson, AZ, USA EMAIL Reza Ebrahimi School of Information Systems and Management University of South Florida Tampa, FL, USA EMAIL Jason Pacheco Department of Computer Science University of Arizona Tucson, AZ, USA EMAIL |
| Pseudocode | Yes | Pseudocode for the rs VAC algorithm can be found in Appendix F. Algorithm 1 rs VAC |
| Open Source Code | Yes | Code is available at https://github.com/Alonso Granados/rs VAC/. |
| Open Datasets | Yes | We consider a risky variation of the tabular environment discussed in Eysenbach et al. (2022)... We use the Mu Jo Co physics engine (Todorov et al., 2012) in Gymnasium (Towers et al., 2023) to evaluate our method on three continuous tasks (Inverted Pendulum, Half Cheetah, and Swimmer). |
| Dataset Splits | No | The paper describes how data is generated through interaction with environments (e.g., sampling 1000 episodes, 10 random trials, 20 episodes per evaluation) rather than using predefined static dataset splits. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) are mentioned for the experimental setup. |
| Software Dependencies | No | The paper mentions various algorithms and frameworks like TD3, SAC, PPO, Mu Jo Co, and Gymnasium, but it does not specify any version numbers for these or other software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Table 1: Hyperparameters for stochastic continuous 2D environment (Discount factor 0.9, Soft target update 0.005, Learning rate 0.0003, MLP with 2 hidden layers of size 256). Table 2: Hyperparameters for risk-aware Mu Jo Co benchmark (Discount factor 0.99, Soft target update 0.005, Learning rate 0.0003, MLP with 2 hidden layers of size 256, β initialization -1). |