Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization
Authors: Hoi-To Wai, Zhuoran Yang, Zhaoran Wang, Mingyi Hong
NeurIPS 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In 4 we illustrate the empirical performance of the proposed algorithm. [...] To verify the performance of our proposed method, we conduct an experiment on the mountaincar dataset [46] under a setting similar to [15] to collect the dataset, we ran Sarsa with d = 300 features to obtain the policy, then we generate the trajectories of actions and states according to the policy with M samples. |
| Researcher Affiliation | Academia | Hoi-To Wai The Chinese University of Hong Kong Shatin, Hong Kong EMAIL Zhuoran Yang Princeton University Princeton, NJ, USA EMAIL Zhaoran Wang Northwestern University Evanston, IL, USA EMAIL Mingyi Hong University of Minnesota Minneapolis, MN, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 PD-Dist IAG Method for Multi-agent, Primal-dual, Finite-sum Optimization |
| Open Source Code | No | The paper does not contain any concrete access information (e.g., a specific repository link, an explicit code release statement, or mention of code in supplementary materials) for the source code of the methodology. |
| Open Datasets | Yes | To verify the performance of our proposed method, we conduct an experiment on the mountaincar dataset [46] under a setting similar to [15] to collect the dataset |
| Dataset Splits | No | The paper mentions 'M = 5000 samples' but does not specify the train/validation/test dataset splits (e.g., percentages, sample counts for each split, or reference to predefined splits). |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using Sarsa and comparing with PDBG, GTD2, and SAGA, but it does not specify version numbers for any software dependencies or libraries required to replicate the experiments. |
| Experiment Setup | Yes | For PD-Dist IAG, we simulate a communication network with N = 10 agents, connected on an Erdos-Renyi graph generated with connectivity of 0.2; for the step sizes, we set γ1 = 0.005/λmax( ˆ A), γ2 = 5 10 3. For this problem, we have d = 300, M = 5000 samples, and there are N = 10 agents. |