reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Reconstruction-Guided Policy: Enhancing Decision-Making through Agent-Wise State Consistency

Authors: Qifan Liang, Yixiang Shan, Haipeng Liu, Zhengbang Zhu, Ting Long, Weinan Zhang, Yuan Tian

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the performance of RGP, we conduct extensive experiments on discrete and continuous environments, experimental results demonstrate the effectiveness. We conduct extensive experiments to evaluate the performance of RGP, and we particularly focus on the research questions: i) How does RGP perform compared with other methods (RQ1)? ii) Can RGP reduce the gap between training and execution (RQ2)? iii) Why does RGP can achieve better performance than other methods(RQ3)? iv) Can RGP explore the potential relationships between agents? (RQ4)? v) Can RGP adapt to continuous action environments? (RQ5)? vi) How does the RGP perform under more challenge partially observable conditions? (RQ6)? The results are illustrated in Table 1.
Researcher Affiliation	Academia	Qifan Liang1, Yixiang Shan1, Haipeng Liu1, Zhengbang Zhu2, Ting Long1 , Weinan Zhang2, Yuan Tian1 1 Jilin University, 2 Shanghai Jiao Tong University
Pseudocode	Yes	Algorithm 1 Training of RGP with Value decomposition methods Algorithm 2 Training of RGP with policy gradient methods
Open Source Code	Yes	Our code is public in https://github.com/Muise4/RGP4/tree/main
Open Datasets	Yes	Environments. We primarily evaluated RGP on SMAC (Samvelyan et al., 2019) and SMACv2 (Ellis et al., 2024). SMAC is the most widely used discrete multi-agent environment, while SMACv2 introduces stochasticity based on SAMC. We set up the SMACv2 maps with 5 ally agents against 5 enemies. Additionally, to further demonstrate the portability of RGP, we conducted experiments in continuous predator-prey and continuous cooperative navigation scenarios (Lowe et al., 2017).
Dataset Splits	No	The paper describes environmental setups and experimental parameters (e.g., number of agents, prey) but does not provide explicit training/test/validation splits for static datasets. The 'data' is generated dynamically through interaction with the simulated environments.
Hardware Specification	Yes	Our model was trained on a setup with 4 NVIDIA A40 GPUs, an Intel Gold 5220 CPU, and 504GB of memory, optimized using the Adam optimizer (Kingma & Ba, 2014).
Software Dependencies	No	The paper mentions using the "Adam optimizer" and refers to "Py MARL2 (Hu et al., 2021)", but it does not specify version numbers for these or other key software components, which is required for reproducibility.
Experiment Setup	Yes	Implementation Details. Our model was trained on a setup with 4 NVIDIA A40 GPUs, an Intel Gold 5220 CPU, and 504GB of memory, optimized using the Adam optimizer (Kingma & Ba, 2014). Due to limited computational resources, we replaced the Unet used in the original paper DDPM (Ho et al., 2020; Rombach et al., 2022) with an MLP. We set the timestep of diffusion to 10, and the heads of attention to 4. The details of other hyperparameters can be found in Appendix A.2 table 4. Appendix A.2 HYPERPARAMETERS DETAIL. Details of RGP s hyperparameters are provided in Table 4. For baseline VDN, QMIX, QPLEX, they were implemented with the hyperparameters of Py MARL2 (Hu et al., 2021). For HPN-QMIX, CADP, PTDE, and SIDiff, they were implemented with their optimal hyperparameters, as specified in their respective papers (Jianye et al., 2022; Zhou et al., 2023; Chen et al., 2022; Xu et al., 2024). Table 4: Hyperparameter settings for the RGP training. (Including Diffusion process timestep, Type of optimizer, Learning rate, Batch size, TD lambda, Training epochs, Buffer size, Target update interval, Attention heads, Attention embedding dim, Agent information mapping dim)