reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning

Authors: Xiaoteng Ma, Shuai Ma, Li Xia, Qianchuan Zhao

JAIR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we conduct diverse experiments from simple bandit problems to continuous control tasks in Mu Jo Co, which demonstrate the effectiveness of our proposed methods.
Researcher Affiliation	Academia	Xiaoteng Ma EMAIL Department of Automation, Tsinghua University, Beijing, 100086, P. R. China Shuai Ma EMAIL School of Business, Sun Yat-sen University, Guangzhou, 510275, P. R. China Li Xia EMAIL (Corresponding author) School of Business, Sun Yat-sen University, Guangzhou, 510275, P. R. China Qianchuan Zhao EMAIL Department of Automation, Tsinghua University, Beijing, 100086, P. R. China
Pseudocode	Yes	Algorithm 1 The framework of MSV optimization Algorithm 2 MSVAC Algorithm 3 MSVPO
Open Source Code	No	The paper does not contain any explicit statement about the release of source code or a link to a code repository.
Open Datasets	Yes	Finally, we conduct diverse experiments from simple bandit problems to continuous control tasks in Mu Jo Co, which demonstrate the effectiveness of our proposed methods.
Dataset Splits	No	The paper describes environments used for experiments (e.g., MuJoCo's Walker2d) but does not provide explicit training/test/validation dataset splits. Instead, it describes an experimental protocol for these environments.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies	No	The paper mentions Mu Jo Co and Open AI gym, but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	Table 3: Hyper-parameters of MSVPO Network learning rate β: 3e-4 Network hidden sizes: [64, 64] Activation function: Tanh Optimizer: Adam Batch size: 256 Gradient Clipping: 10 Clipping parameter ε: 0.2 Optimization Epochs M: 10 GAE parameter λ: 0.95 Average Value Constraint Coefficient in APO (Ma et al., 2021) ν: 0.3