reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Distance-based Anomaly Detection Framework for Deep Reinforcement Learning

Authors: Hongming Zhang, Ke Sun, bo xu, Linglong Kong, Martin Müller

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on classical control environments, Atari games, and autonomous driving scenarios, we demonstrate the effectiveness of our MD-based detection framework.
Researcher Affiliation	Academia	Hongming Zhang EMAIL Department of Computing Science, University of Alberta Alberta Machine Intelligence Institute (Amii), University of Alberta Ke Sun EMAIL Department of Mathematical and Statistical Sciences, University of Alberta Alberta Machine Intelligence Institute (Amii), University of Alberta Bo Xu EMAIL Institute of Automation, Chinese Academy of Sciences Linglong Kong EMAIL Department of Mathematical and Statistical Sciences, University of Alberta Alberta Machine Intelligence Institute (Amii), University of Alberta Martin Müller EMAIL Department of Computing Science, University of Alberta Alberta Machine Intelligence Institute (Amii), University of Alberta
Pseudocode	Yes	Algorithm 1 MDX Detection Framework in the Offline Setting Algorithm 2 MDX Detection Framework in the Online Setting, PPO Style
Open Source Code	No	No explicit statement or link to the source code for the methodology described in the paper is provided. The link to OpenReview is a platform for paper review, not code.
Open Datasets	Yes	For feature-input tasks, we choose two classical control environments in Open AI gym (Brockman et al., 2016), including Mountain Car (Barto et al., 1983) and Cart Pole (Moore, 1990). For image-input tasks, we choose six Atari games (Bellemare et al., 2013). We further conduct experiments on autonomous driving environments (Dosovitskiy et al., 2017) as one potential application.
Dataset Splits	Yes	In the offline setting, we randomly split the states from the given dataset into calibration and evaluation sets, each containing 50% of the data.
Hardware Specification	No	The paper describes the policy network architectures (e.g., 'two fully connected layers, each containing 128 units with ReLU activation functions' or 'the same network architecture as described in the PPO paper') but does not specify any hardware details like GPU models, CPU types, or memory used for running experiments.
Software Dependencies	No	The paper mentions using Proximal Policy Optimization (PPO) as the baseline RL algorithm and environments like Open AI Gym and CARLA, but it does not specify version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, or specific library versions).
Experiment Setup	Yes	Hyperparameters in our methods are shown in Table 7. Table 7: Hyper-parameters in the training phase. RL-related parameters are the same as those of the PPO algorithm. Confidence level (1-α) 1-0.05 Moving window size (m) 5120 Sample size (Nc) 2560 Iteration (K) 10000 (1e7 steps in total) Environment number (N) 8 Horizon (T) 128