A Distance-based Anomaly Detection Framework for Deep Reinforcement Learning
Authors: Hongming Zhang, Ke Sun, bo xu, Linglong Kong, Martin Müller
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments on classical control environments, Atari games, and autonomous driving scenarios, we demonstrate the effectiveness of our MD-based detection framework. |
| Researcher Affiliation | Academia | Hongming Zhang EMAIL Department of Computing Science, University of Alberta Alberta Machine Intelligence Institute (Amii), University of Alberta Ke Sun EMAIL Department of Mathematical and Statistical Sciences, University of Alberta Alberta Machine Intelligence Institute (Amii), University of Alberta Bo Xu EMAIL Institute of Automation, Chinese Academy of Sciences Linglong Kong EMAIL Department of Mathematical and Statistical Sciences, University of Alberta Alberta Machine Intelligence Institute (Amii), University of Alberta Martin Müller EMAIL Department of Computing Science, University of Alberta Alberta Machine Intelligence Institute (Amii), University of Alberta |
| Pseudocode | Yes | Algorithm 1 MDX Detection Framework in the Offline Setting Algorithm 2 MDX Detection Framework in the Online Setting, PPO Style |
| Open Source Code | No | No explicit statement or link to the source code for the methodology described in the paper is provided. The link to OpenReview is a platform for paper review, not code. |
| Open Datasets | Yes | For feature-input tasks, we choose two classical control environments in Open AI gym (Brockman et al., 2016), including Mountain Car (Barto et al., 1983) and Cart Pole (Moore, 1990). For image-input tasks, we choose six Atari games (Bellemare et al., 2013). We further conduct experiments on autonomous driving environments (Dosovitskiy et al., 2017) as one potential application. |
| Dataset Splits | Yes | In the offline setting, we randomly split the states from the given dataset into calibration and evaluation sets, each containing 50% of the data. |
| Hardware Specification | No | The paper describes the policy network architectures (e.g., 'two fully connected layers, each containing 128 units with ReLU activation functions' or 'the same network architecture as described in the PPO paper') but does not specify any hardware details like GPU models, CPU types, or memory used for running experiments. |
| Software Dependencies | No | The paper mentions using Proximal Policy Optimization (PPO) as the baseline RL algorithm and environments like Open AI Gym and CARLA, but it does not specify version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, or specific library versions). |
| Experiment Setup | Yes | Hyperparameters in our methods are shown in Table 7. Table 7: Hyper-parameters in the training phase. RL-related parameters are the same as those of the PPO algorithm. Confidence level (1-α) 1-0.05 Moving window size (m) 5120 Sample size (Nc) 2560 Iteration (K) 10000 (1e7 steps in total) Environment number (N) 8 Horizon (T) 128 |