reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Extreme Risk Mitigation in Reinforcement Learning using Extreme Value Theory

Authors: Karthik Somayaji NS, Yu Wang, Malachi Schram, Jan Drgona, Mahantesh M Halappanavar, Frank Liu, Peng Li

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluations show that the proposed method outperforms other risk averse RL algorithms on a diverse range of benchmark tasks, each encompassing distinct risk scenarios.
Researcher Affiliation	Academia	Karthik Somayaji NS EMAIL Department of Electrical and Computer Engineering University of California Santa Barbara; Yu Wang EMAIL Department of Electrical and Computer Engineering University of California Santa Barbara; Malachi Schram EMAIL Thomas Jefferson National Accelerator Laboratory S; Jan Drgona EMAIL Pacific Northwest National Laboratory; Mahantesh Halappanavar EMAIL Pacific Northwest National Laboratory; Frank Liu EMAIL School of Data Science Old Dominion University; Peng Li EMAIL Department of Electrical and Computer Engineering University of California Santa Barbara
Pseudocode	Yes	7.4 Algorithm for EVAC Algorithm 1: Extreme Valued Actor Critic: EVAC
Open Source Code	No	The paper uses well-known open-source environments like Mujoco and Safety-gym, and refers to them with citations. However, it does not provide any explicit statement or link for the source code of the authors' own methodology (EVAC) described in this paper.
Open Datasets	Yes	We use the Half-Cheetah environment (Brockman et al., 2016) for our demonstration. We experiment on two benchmark Open-AI environments Brockman et al. (2016) namely Mujoco environments and Safety-gym environments Ji et al. (2023). We employ mobile-env (Schneider et al., 2022), an open source environment that simulates the connections and Qo E between several base stations and cell phone users.
Dataset Splits	No	The paper describes reinforcement learning environments where data is generated through agent interaction. It specifies training duration (100,000 time steps) and evaluation procedure (inference on 5 trained agents, each completing an episode) but does not provide explicit training/test/validation dataset splits in the conventional supervised learning sense.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to conduct the experiments.
Software Dependencies	No	The paper mentions using Python and various environments (Mujoco, Safety-gym, mobile-env) but does not provide specific version numbers for any software dependencies, such as programming languages, libraries, or frameworks.
Experiment Setup	Yes	During training and inference, the max episode length of the agent is set to 1000. During training, the agents were trained for 100,000 time steps on the whole. The batch size B = 128 and we set K, the number of samples sampled from the GPD distribution to 50. We set the learning rates for the actor and critic to 0.001 in all cases. The discount factor γ = 0.99 for all cases too. The soft update parameter τ = 0.02 for all our experiments on the Hopper and Walker2d, while τ = 0.01 for the Half Cheetah environment. Both the actor and critic have 3 layers with hidden size being 128.