Adversarial Inception Backdoor Attacks against Reinforcement Learning

Authors: Ethan Rathbun, Alina Oprea, Christopher Amato

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here, we evaluate Q-Incept against Troj DRL and Sleeper Nets, representing the state of the art in forced action manipulation and dynamic reward poisoning attacks. We perform our evaluation in terms of Attack Success Rate (ASR) and Benign Return (BR), relating to our objectives of attack success and attack stealth, respectively, defined below: ASR(π+|δ) .= Es S[π+(δ(s))] BR(π+|M) .= Es0 M[V M π+(s0)] (14) where s0 is a (potentially random) initial state given by M and π+ is the poisoned policy we are evaluating. Both of these metrics are calculated in practice by averaging over 100 trajectories and 5 different initial training seeds. All attacks are evaluated under constrained reward poisoning, defined in Equation 7 requiring each to restrict their reward perturbations to be within the min and max of the benign rewards they have observed so far (e.g., lines 3 and 9 in Algorithm 1). We evaluate these attacks using cleanrl s implementation of PPO (Huang et al., 2022) on 7 environments. Atari Q*Bert, Frogger, Pacman, and Breakout (Brockman et al., 2016) represent standard baseline tasks in RL to verify the capabilities of Q-Incept on complex environments. Additionally, CAGE Challenge 2 (Kiely et al., 2023), Highway Merge (Leurent, 2018), and Safety Car (Ji et al., 2023) extend the diversity of our analysis to other domains spanning cyber network defending, simplified self driving, and safetyaware robotic navigation tasks, respectively. This allows us to verify the effectiveness of Q-Incept across multiple task domains which share little overlap. Further experimental details and results are given in Appendix A.2 and Appendix A.4, respectively.
Researcher Affiliation Academia 1Khoury College of Computer Sciences, Northeastern University, Boston, MA, United States.
Pseudocode Yes Algorithm 1 Generalized Inception Attack (Q-Incept)
Open Source Code Yes Provide in-depth evaluation of Q-Incept on environments spanning Atari game playing, cyber network defending, simplified self driving, and safety-aware navigating tasks (Code is available on github).
Open Datasets Yes We evaluate these attacks using cleanrl s implementation of PPO (Huang et al., 2022) on 7 environments. Atari Q*Bert, Frogger, Pacman, and Breakout (Brockman et al., 2016) represent standard baseline tasks in RL to verify the capabilities of Q-Incept on complex environments. Additionally, CAGE Challenge 2 (Kiely et al., 2023), Highway Merge (Leurent, 2018), and Safety Car (Ji et al., 2023) extend the diversity of our analysis to other domains spanning cyber network defending, simplified self driving, and safetyaware robotic navigation tasks, respectively.
Dataset Splits No The paper uses interactive Reinforcement Learning environments (Atari Q*Bert, Frogger, Pacman, Breakout, CAGE Challenge 2, Highway Merge, Safety Car) rather than static datasets with predefined train/test/validation splits. While evaluation metrics are calculated by 'averaging over 100 trajectories and 5 different initial training seeds', this describes the evaluation methodology rather than a dataset split for reproduction in the traditional sense.
Hardware Specification Yes We ran tests on a desktop machine (2x RTX 4090, Threadripper 7980x) and found that Sleeper Nets, Troj DRL, and Q-Incept run at 1038, 987, and 730 simulation steps per second respectively against Atari Q*bert.
Software Dependencies No The paper mentions using 'cleanrl s implementation of PPO (Huang et al., 2022)' and that DQN is applied, but specific version numbers for these software libraries or a comprehensive list of versioned dependencies are not provided.
Experiment Setup Yes In Table 5 we summarize each environment we studied, their properties, and the learning parameters we used in each experiment. Parameters not mentioned in the table are simply default values chosen in the cleanrl (Huang et al., 2022) implementation of PPO. In Table 6 we summarize the two relevant parameters we varied across environments, Steps per Update and Start Poisoning Threshold. Steps per Update represents the number of benign environment steps that would occur between each DQN update of ˆQ. On Highway Merge a much lower value was needed here as the adversary has little time to learn the agent s Q-fuction. In contrast, for Q*Bert, the number of steps per update was very high as the attack was very successful with little DQN optimization. The Start Poisoning Threshold represents the portion of benign timesteps the PPO agent would train for before the adversary would begin poisoning. This parameter is intended to allow the adversary s DQN approximation to begin to converge before they begin poisoning.