Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning
Authors: Sunwoo Lee, Jaebak Hwang, Yonghyeon Jo, Seungyul Han
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the proposed methods on two standard benchmarks in MARL research: the Multi-Agent Particle Environment (MPE) (Lowe et al., 2017) and the Star Craft II Multi-Agent Challenge (SMAC) (Samvelyan et al., 2019), as illustrated in Fig. 6. Specifically, we compare: (1) the impact of the proposed Wolfpack adversarial attack against other adversarial attacks, and (2) the robustness of the WALL framework in defending against such attacks compared to other robust MARL methods. Also, an ablation study analyzes the effect of the proposed components and hyperparameters on robustness. All results are reported as the mean and standard deviation (shaded areas for graphs and values for tables) across 5 random seeds. |
| Researcher Affiliation | Academia | 1Graduate School of Artificial Intelligence, UNIST, Ulsan, South Korea. Correspondence to: Seungyul Han <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 WALL framework |
| Open Source Code | Yes | Our code is available at https://github.com/sunwoolee0504/WALL. |
| Open Datasets | Yes | We conduct experiments in the MPE (Lowe et al., 2017) and SMAC (Samvelyan et al., 2019) environments. This section provides detailed descriptions of their setup and features. The Multi-Agent Particle Environment (MPE) (Lowe et al., 2017) is a widely used benchmark suite consisting of multi-agent scenarios. The Star Craft Multi-Agent Challenge (SMAC) serves as a benchmark for cooperative Multi-Agent Reinforcement Learning (MARL), focusing on decentralized micromanagement tasks. |
| Dataset Splits | No | The paper uses the Multi-Agent Particle Environment (MPE) and Star Craft II Multi-Agent Challenge (SMAC), which are simulation environments for Multi-Agent Reinforcement Learning. While various scenarios are described (e.g., PP 3/1, 8m, MMM), these represent different environment configurations or tasks, not predefined training, validation, or test dataset splits in the traditional sense for a static dataset. The paper mentions training policies for 3M timesteps but does not specify how the environmental interactions are formally split for training and evaluation. |
| Hardware Specification | Yes | All experiments in this paper are conducted on a GPU server equipped with an NVIDIA GeForce RTX 3090 GPU and AMD EPYC 7513 32-Core processors running Ubuntu 20.04 and PyTorch. |
| Software Dependencies | No | The paper mentions "Ubuntu 20.04 and PyTorch" and refers to the PyMARL codebase and QPLEX official codebase. However, it does not specify the version number for PyTorch or the exact versions of the PyMARL and QPLEX codebases used, which are key software components for reproducibility. |
| Experiment Setup | Yes | We conduct parameter search for the number of Wolfpack attacks KWP [1, 2, 3, 4], the attack duration t WP [1, 2, 3, 4], the number of follow-up agents m, and the temperature T [0.1, 0.2, 0.5, 1.0]. The Q-learning hyperparameters (shared across all CTDE methods) and those specific to the CTDE algorithms are detailed in Table C.1 and Table C.2, respectively. The Wolfpack adversarial attack-related hyperparameters for the WALL framework, shared across all SMAC scenarios and scenario-specific setups, are presented in Table C.3. |