reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Exploration in Multi-Agent Reinforcement Learning via Farsighted Self-Direction

Authors: Tiancheng Lao, Xudong Guo, Mengge Liu, Junjie Yu, Yi Liu, Wenhui Fan

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the method on didactic examples and demonstrate the outperformance of our method on challenging Star Craft II micromanagement tasks. ... Empirical results show that FSD achieves state-of-the-art performance compared to several widely adopted baseline methods. Notably, unlike previous curiosity-driven methods, FSD does not use an explorer, thereby saving a considerable amount of computational resources. Additionally, we demonstrate the effectiveness of Qint i and clipped double Q-learning separately through ablation studies.
Researcher Affiliation	Academia	Tiancheng Lao EMAIL Department of Automation Tsinghua University, Beijing, China; Xudong Guo EMAIL Department of Automation Tsinghua University, Beijing, China; Mengge Liu EMAIL Department of Automation Tsinghua University, Beijing, China; Junjie Yu EMAIL Department of Automation Tsinghua University, Beijing, China; Yi Liu EMAIL Department of Automation Tsinghua University, Beijing, China; Wenhui Fan EMAIL Department of Automation Tsinghua University, Beijing, China
Pseudocode	No	The paper describes the FSD method using mathematical equations and textual explanations, but it does not include any clearly labeled pseudocode blocks or algorithms.
Open Source Code	No	The paper states: 'Baselines are trained using their open-source codes, with some results derived from the open-source results of EXPODE (Zhang & Yu, 2023).' This refers to the code of other methods, not the authors' own implementation of FSD. There is no explicit statement or link provided for the FSD source code.
Open Datasets	Yes	We validate our method on didactic examples used in EMC (Zheng et al., 2021). ... Subsequently, we evaluated FSD on Predator Prey (Rashid et al., 2020) and several challenging Star Craft II micromanagement tasks (Samvelyan et al., 2019).
Dataset Splits	No	For Simultaneous Arrival, it mentions: 'the standard setting of the scenario lacks randomness, meaning that during evaluation, the win rate is either 0 or 1 for a fixed policy, and the curves essentially represent the proportion of wins across five different runs.' For Predator Prey and SMAC, the paper refers to the benchmarks themselves but does not provide explicit details on how the data was split into training, validation, or test sets for their experiments.
Hardware Specification	Yes	Experiments are conducted on eight NVIDIA RTX 4090s, with training time ranging from half an hour to 10 hours, depending on the complexity of the task and the number of agents involved.
Software Dependencies	No	The paper does not explicitly mention any specific software dependencies or library versions used for the implementation (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	In our experiments, we set p to a relatively high value of 3. ... As for evaluation, action is selected greedily based on the controller Qϕ i. All experiments have been repeated for five runs over different random seeds. ... Effect of exploration coefficient α. We also analyze the effect of the exploration coefficient α in eq. 4 on performance using FSD-sgl-VDN and FSD in the Predator Prey and the MMM2 map of SMAC, respectively. Fig. 8 shows that, in general, a coefficient between 0.01 and 0.1 can effectively improve exploration efficiency. In the relatively simple Predator-Prey scenario, where coordinated exploration is crucial, α can take a larger value, such as 1, to further enhance exploration. However, on the more complex MMM2 map, setting α = 1 leads to excessive exploration by the agents, which ultimately reduces their learning efficiency.