Towards Efficient Collaboration via Graph Modeling in Reinforcement Learning
Authors: Wenzhe Fan, Zishun Yu, Chengdong Ma, Changye Li, Yaodong Yang, Xinhua Zhang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results in networked systems such as traffic scheduling and power control demonstrate that f-MAT achieves superior performance compared to strong baselines, thereby paving the way for handling complex collaborative problems. We evaluate the performance and efficiency of f-MAT in grid alignment, traffic scheduling, and power control. Empirical results demonstrate that f-MAT fulfills the efficient collaboration compared to other baselines, paving the way for efficient collaboration in multi-agent systems. |
| Researcher Affiliation | Academia | Wenzhe Fan1 , Zishun Yu1, Chengdong Ma2, Changye Li3, Yaodong Yang2, Xinhua Zhang1 1 University of Illinois Chicago 2 Institute for Artificial Intelligence, Peking University 3 Yuanpei College, Peking University EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | The pseudo code of f-MAT can be found in Appendix A. The complete pseudocode for f-MAT’s encoder and decoder can be found in Algorithm 1 in Appendix A. The method is detailed in Algorithm 3 in Appendix A. |
| Open Source Code | No | No explicit statement about open-source code or repository links was found. |
| Open Datasets | Yes | Our first experiment is on a simplified domain of traffic flow (Zhang, Aberdeen, and Vishwanathan 2007), called Grid Sim... The second environment adapted the Simulation of Urban Mobility (SUMO, Chen et al. 2020; Ault and Sharon 2021)... We have two microgrid systems (Chen et al. 2021)... |
| Dataset Splits | No | The paper describes the experimental environments and their characteristics, but it does not specify any training, validation, or test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python version, library versions, or specific solver versions) used in the experiments. |
| Experiment Setup | Yes | As shown in Fig. 6a, Lenc = 3 produces the most stable trend and achieves the highest reward. Based on the above experiments, we recommend setting Lenc = 3, which we used to produce our main results. To explore the relationship between Lenc and group size, we use the optimality gap, the value between the true optimal reward and the learned reward achieved by the algorithm, to illustrate the variations. |