Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning

Authors: Xiaoteng Ma, Shuai Ma, Li Xia, Qianchuan Zhao

JAIR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we conduct diverse experiments from simple bandit problems to continuous control tasks in Mu Jo Co, which demonstrate the effectiveness of our proposed methods.
Researcher Affiliation Academia Xiaoteng Ma EMAIL Department of Automation, Tsinghua University, Beijing, 100086, P. R. China Shuai Ma EMAIL School of Business, Sun Yat-sen University, Guangzhou, 510275, P. R. China Li Xia EMAIL (Corresponding author) School of Business, Sun Yat-sen University, Guangzhou, 510275, P. R. China Qianchuan Zhao EMAIL Department of Automation, Tsinghua University, Beijing, 100086, P. R. China
Pseudocode Yes Algorithm 1 The framework of MSV optimization Algorithm 2 MSVAC Algorithm 3 MSVPO
Open Source Code No The paper does not contain any explicit statement about the release of source code or a link to a code repository.
Open Datasets Yes Finally, we conduct diverse experiments from simple bandit problems to continuous control tasks in Mu Jo Co, which demonstrate the effectiveness of our proposed methods.
Dataset Splits No The paper describes environments used for experiments (e.g., MuJoCo's Walker2d) but does not provide explicit training/test/validation dataset splits. Instead, it describes an experimental protocol for these environments.
Hardware Specification No The paper does not specify any particular hardware (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies No The paper mentions Mu Jo Co and Open AI gym, but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes Table 3: Hyper-parameters of MSVPO Network learning rate β: 3e-4 Network hidden sizes: [64, 64] Activation function: Tanh Optimizer: Adam Batch size: 256 Gradient Clipping: 10 Clipping parameter ε: 0.2 Optimization Epochs M: 10 GAE parameter λ: 0.95 Average Value Constraint Coefficient in APO (Ma et al., 2021) ν: 0.3