reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Boosting Multi-agent Reinforcement Learning via Contextual Prompting

Authors: Yue Deng, Zirui Wang, Xi Chen, Yin Zhang

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on three tasks, Spread, Tag, and Reference, from the Particle World Environment (PWE) show that our framework signiﬁcantly accelerates the training process of existing state-of-the-art CTDE and non-CTDE MARL methods, while also competing with or outperforming their original versions.
Researcher Affiliation	Academia	Yue Deng EMAIL Zirui Wang EMAIL Xi Chen chan EMAIL Yin Zhang EMAIL College of Computer Science and Technology Zhejiang University Hangzhou, China
Pseudocode	Yes	Algorithm 1 The overall algorithm of our framework Algorithm 2 Trajectory tree update algorithm
Open Source Code	No	The paper mentions using third-party codebases like 'pymarl' and 'petting-zoo' for implementations of algorithms and environments, but it does not provide an explicit statement or link to the source code for the novel framework described in this paper.
Open Datasets	Yes	Experimental results on three tasks, Spread, Tag, and Reference, from the Particle World Environment (PWE) show that our framework signiﬁcantly accelerates the training process of existing state-of-the-art CTDE and non-CTDE MARL methods, while also competing with or outperforming their original versions. Experiments on this environment are based on the implementation of petting-zoo (Terry et al., 2021) and the settings follow the default parameters. Additionally, we test our approach on three tasks with diﬀerent diﬃculties from SMAC (Samvelyan et al., 2019) environment
Dataset Splits	No	The paper describes experiments in reinforcement learning environments, mentioning the use of '5 seeds' for running experiments to calculate average values and variances. However, it does not explicitly detail training, validation, or test dataset splits in terms of percentages, sample counts, or specific predefined partitions for static datasets, as is typical in supervised learning. The 'testing curves' presented are based on evaluation within the dynamic RL environment rather than a pre-partitioned static test set.
Hardware Specification	No	The paper implies the use of computational resources by stating, 'Empirically in this paper, with the help of accelerating packages and the choice of hyper-parameters, the overall time used by our framework is twice larger than that of original MARL algorithms.' However, it does not provide any specific details about the hardware used, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	Experiments on this environment are based on the implementation of petting-zoo (Terry et al., 2021) and the settings follow the default parameters. The QMIX and IQL algorithms are provided by pymarl (Samvelyan et al., 2019) which is a framework for deep multi-agent reinforcement learning. While software components like 'petting-zoo' and 'pymarl' are mentioned with citations, no specific version numbers for these or any other software dependencies are provided.
Experiment Setup	Yes	Table 3: Hyper-parameters in experiment tasks Parameters Spread Tag Reference minimum observation similarity 0.1 0.1 0.1 sample width 15 15 15 prune width 7 7 7 trajectory depth 25 20 25 prune depth 13 10 13 sample interval 400 400 400 λ 1/n 1/n 1/n Except for the hyper-parameters described in the table, the parameters for algorithm QMIX and IQL as well as those including replay buﬀer size, learning rates, and the optimizer are the default values provided by pymarl code base.