Risk-sensitive control as inference with Rényi divergence

Authors: Kaito Ito, Kenji Kashima

NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The behavior of the risk-sensitive soft actor-critic is examined via an experiment.
Researcher Affiliation Academia Kaito Ito The University of Tokyo EMAIL Kenji Kashima Kyoto University EMAIL
Pseudocode No The paper describes algorithms but does not provide them in a structured pseudocode or algorithm block.
Open Source Code Yes The code is available at https://github.com/kaito-1111/risk-sensitive-sac.git.
Open Datasets Yes The environment is Pendulum-v1 in Open AI Gymnasium.
Dataset Splits No The paper mentions training and testing but does not provide specific percentages or absolute counts for dataset splits (train/validation/test).
Hardware Specification Yes For the training, we used an Ubuntu 20.04 server (GPU: NVIDIA Ge Force RTX 2080Ti).
Software Dependencies No The implementation of the risk-sensitive SAC (RSAC) algorithm follows the stable-baselines3 [50] version of the SAC algorithm... optimizer Adam [51]. No specific version numbers for these or other software are provided.
Experiment Setup Yes Now, we introduce a series of hyperparameters listed in Table 1 shared for both SAC and RSAC algorithms. Table 1: SAC and RSAC Hyperparameters Parameter Value optimizer Adam [51] learning rate 10 3 discount factor 0.99 regularization coefficient 0.1 target smoothing coefficient 0.005 replay buffer size 105 number of critic networks 2 number of hidden layers (all networks) 2 number of hidden units per layer 256 activation function Re LU