Reidentify: Context-Aware Identity Generation for Contextual Multi-Agent Reinforcement Learning

Authors: Zhiwei Xu, Kun Hu, Xin Xin, Weiliang Meng, Yiwei Shi, Hangyu Mao, Bin Zhang, Dapeng Li, Jiangjin Yin

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on CMARL benchmarks demonstrate that CAID significantly outperforms existing approaches by enhancing both sample efficiency and generalization across diverse context variants. 5. Experiments In this section, we evaluate CAID in three well-known CMARL environments: Star Craft Multi-Agent Challenge (SMACv2) (Ellis et al., 2023), Vectorized Multi-Agent Simulator (VMAS) (Bettini et al., 2022) and Traffic Signal Control (Py TSC) (Bokade & Jin, 2024). The tasks in these domains exhibit considerable variability across episodes, primarily in the agents positions, agent types, and target locations. First, the performance of CAID is evaluated through a comparison with several classical algorithms, including Weighted QMIX (Rashid et al., 2020), QPLEX (Wang et al., 2021a), and the baseline QMIX (Rashid et al., 2018), along with recently proposed methods such as RIIT (Hu et al., 2021), COLA (Xu et al., 2023), and VMIX (Su et al., 2021). Then the contributions of individual modules within the framework are discussed. To ensure the reliability of the results, each experiment is repeated five times with different random seeds. For a fair comparison, all hyperparameters, except those introduced specifically by CAID, are kept consistent with the original methods. Unless explicitly stated, CAID refers to the variant implemented on QMIX. Details on algorithm hyperparameters are provided in Appendix A. 5.1. Star Craft Multi-Agent Challenge 5.2. Vectorized Multi-Agent Simulator 5.3. Ablation Study
Researcher Affiliation Collaboration Zhiwei Xu 1 Kun Hu 2 Xin Xin 1 Weiliang Meng 3 Yiwei Shi 4 Hangyu Mao 5 Bin Zhang 3 Dapeng Li 3 Jiangjin Yin 6 1Shandong University 2National University of Defense Technology 3Institute of Automation, Chinese Academy of Sciences 4University of Bristol 5Kuaishou Technology 6Huazhong Agricultural University. Correspondence to: Jiangjin Yin <EMAIL>.
Pseudocode Yes A.1. Algorithmic Description The pseudo-code of CAID is shown in Algorithm 1. Algorithm 1 Context-Aware Identity Generation (CAID) 1: for each episode do 2: Get the global state s1 and the local observations z1 = {z1 1, z2 1, . . . , zn 1 } of all agents 3: for t 1 to T 1 do 4: for a 1 to n do 5: Select action ua t according to the agent network 6: end for 7: Carry out the joint action ut = {u1 t, . . . , un t } 8: Get the global reward rt+1, the next local observations zt+1, and the next state st+1 9: end for 10: Store the trajectory in the replay buffer D. 11: Sample a batch of episodes B Uniform(D). 12: Compute the context variable ˆc using Equation (2). 13: Generate agent identities based on Equation (8). 14: Evaluate the transformed Q-values for all agents using Equation (6). 15: Update the parameters of the CAID model using Equation (7). 16: Periodically update the parameters of the target network. 17: end for
Open Source Code No The paper does not contain any explicit statements about releasing source code for the methodology described, nor does it provide a link to a code repository. It mentions 'Py MARL2 (Hu et al., 2021)' in Appendix A.2, but this refers to a third-party framework used for hyperparameter settings, not the authors' own code.
Open Datasets Yes In this section, we evaluate CAID in three well-known CMARL environments: Star Craft Multi-Agent Challenge (SMACv2) (Ellis et al., 2023), Vectorized Multi-Agent Simulator (VMAS) (Bettini et al., 2022) and Traffic Signal Control (Py TSC) (Bokade & Jin, 2024).
Dataset Splits No The paper describes dynamic environments where contexts and agent configurations change per episode or task variants, but it does not specify explicit training/testing/validation splits for a fixed dataset, which is what is typically required for reproducibility in terms of data partitioning. For instance, in SMACv2, "agents positions and types can change dynamically in each episode" and in Py TSC, "the contexts (e.g., traffic flow) in each episode are dynamically randomized". While these describe the varying nature of the data, they do not provide fixed dataset splits.
Hardware Specification Yes All experiments in this study were conducted using NVIDIA Ge Force RTX 2080 Ti graphics cards and Intel(R) Xeon(R) Silver 4114 CPUs.
Software Dependencies No The paper mentions several software components like 'Py MARL2 (Hu et al., 2021)', 'SUMO (L opez et al., 2018)', and 'City Flow (Zhang et al., 2019)' in the context of environments or hyperparameter settings. However, it does not provide specific version numbers for any of these components, which is necessary for reproducible software dependencies. It also does not mention versions for general programming languages or deep learning frameworks like Python or PyTorch.
Experiment Setup Yes A.2. Hyperparameters Unless specified otherwise, the hyperparameter configurations across different environments are presented in Table 2. These settings are identical to those provided in Py MARL2 (Hu et al., 2021). All experiments in this study were conducted using NVIDIA Ge Force RTX 2080 Ti graphics cards and Intel(R) Xeon(R) Silver 4114 CPUs. For all methods, exploration during training is achieved via independent ϵ-greedy action selection, with ϵ linearly annealed from 1.0 to 0.05 over 50,000 steps. In SMACv2, training ends after 5 million timesteps, whereas it concludes after 1 million timesteps in VMAS and 2 million timesteps in Py TSC. Table 2. Hyperparameter settings. Description Value Learning rate 0.001 Type of optimizer Adam How many episodes to update target networks 200 Reduce global norm of gradients 10 Batch size 128 Capacity of replay buffer 5000 Batch size for parallel execution 8 Discount factor 0.99