Adversarial Inception Backdoor Attacks against Reinforcement Learning
Authors: Ethan Rathbun, Alina Oprea, Christopher Amato
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here, we evaluate Q-Incept against Troj DRL and Sleeper Nets, representing the state of the art in forced action manipulation and dynamic reward poisoning attacks. We perform our evaluation in terms of Attack Success Rate (ASR) and Benign Return (BR), relating to our objectives of attack success and attack stealth, respectively, defined below: ASR(π+|δ) .= Es S[π+(δ(s))] BR(π+|M) .= Es0 M[V M π+(s0)] (14) where s0 is a (potentially random) initial state given by M and π+ is the poisoned policy we are evaluating. Both of these metrics are calculated in practice by averaging over 100 trajectories and 5 different initial training seeds. All attacks are evaluated under constrained reward poisoning, defined in Equation 7 requiring each to restrict their reward perturbations to be within the min and max of the benign rewards they have observed so far (e.g., lines 3 and 9 in Algorithm 1). We evaluate these attacks using cleanrl s implementation of PPO (Huang et al., 2022) on 7 environments. Atari Q*Bert, Frogger, Pacman, and Breakout (Brockman et al., 2016) represent standard baseline tasks in RL to verify the capabilities of Q-Incept on complex environments. Additionally, CAGE Challenge 2 (Kiely et al., 2023), Highway Merge (Leurent, 2018), and Safety Car (Ji et al., 2023) extend the diversity of our analysis to other domains spanning cyber network defending, simplified self driving, and safetyaware robotic navigation tasks, respectively. This allows us to verify the effectiveness of Q-Incept across multiple task domains which share little overlap. Further experimental details and results are given in Appendix A.2 and Appendix A.4, respectively. |
| Researcher Affiliation | Academia | 1Khoury College of Computer Sciences, Northeastern University, Boston, MA, United States. |
| Pseudocode | Yes | Algorithm 1 Generalized Inception Attack (Q-Incept) |
| Open Source Code | Yes | Provide in-depth evaluation of Q-Incept on environments spanning Atari game playing, cyber network defending, simplified self driving, and safety-aware navigating tasks (Code is available on github). |
| Open Datasets | Yes | We evaluate these attacks using cleanrl s implementation of PPO (Huang et al., 2022) on 7 environments. Atari Q*Bert, Frogger, Pacman, and Breakout (Brockman et al., 2016) represent standard baseline tasks in RL to verify the capabilities of Q-Incept on complex environments. Additionally, CAGE Challenge 2 (Kiely et al., 2023), Highway Merge (Leurent, 2018), and Safety Car (Ji et al., 2023) extend the diversity of our analysis to other domains spanning cyber network defending, simplified self driving, and safetyaware robotic navigation tasks, respectively. |
| Dataset Splits | No | The paper uses interactive Reinforcement Learning environments (Atari Q*Bert, Frogger, Pacman, Breakout, CAGE Challenge 2, Highway Merge, Safety Car) rather than static datasets with predefined train/test/validation splits. While evaluation metrics are calculated by 'averaging over 100 trajectories and 5 different initial training seeds', this describes the evaluation methodology rather than a dataset split for reproduction in the traditional sense. |
| Hardware Specification | Yes | We ran tests on a desktop machine (2x RTX 4090, Threadripper 7980x) and found that Sleeper Nets, Troj DRL, and Q-Incept run at 1038, 987, and 730 simulation steps per second respectively against Atari Q*bert. |
| Software Dependencies | No | The paper mentions using 'cleanrl s implementation of PPO (Huang et al., 2022)' and that DQN is applied, but specific version numbers for these software libraries or a comprehensive list of versioned dependencies are not provided. |
| Experiment Setup | Yes | In Table 5 we summarize each environment we studied, their properties, and the learning parameters we used in each experiment. Parameters not mentioned in the table are simply default values chosen in the cleanrl (Huang et al., 2022) implementation of PPO. In Table 6 we summarize the two relevant parameters we varied across environments, Steps per Update and Start Poisoning Threshold. Steps per Update represents the number of benign environment steps that would occur between each DQN update of ˆQ. On Highway Merge a much lower value was needed here as the adversary has little time to learn the agent s Q-fuction. In contrast, for Q*Bert, the number of steps per update was very high as the attack was very successful with little DQN optimization. The Start Poisoning Threshold represents the portion of benign timesteps the PPO agent would train for before the adversary would begin poisoning. This parameter is intended to allow the adversary s DQN approximation to begin to converge before they begin poisoning. |