Multi-Agent Multi-Armed Bandits with Limited Communication
Authors: Mridul Agarwal, Vaneet Aggarwal, Kamyar Azizzadenesheli
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We consider various problem setups to evaluate our algorithms. We compare the proposed algorithms, LCC-UCB and LCC-UCB-GRAPH with a no-communication strategy and a full communication strategy. ... We present the result in Fig. 2 for 30 independent runs for expected rewards drawn from uniform U(0, 1) distribution. |
| Researcher Affiliation | Academia | Mridul Agarwal EMAIL Vaneet Aggarwal EMAIL Kamyar Azizzadenesheli EMAIL Purdue University |
| Pseudocode | Yes | Algorithm 1 LCC-UCB(n, Sn, [N] \ {n}, T) ... Algorithm 2 UCB(n, A, Tj) ... Algorithm 3 LCC-UCB-GRAPH(Sn, G, T0, T) |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper describes simulations using expected rewards drawn from a uniform U(0,1) distribution and generated Erdős-Rényi graphs, but does not provide concrete access information (link, DOI, repository, or citation) for any pre-existing publicly available dataset. |
| Dataset Splits | No | The paper describes generating data for simulations and conducting '30 independent runs,' but does not specify dataset splits (e.g., training, validation, test percentages or counts) as it does not use a fixed, pre-existing dataset. |
| Hardware Specification | No | The paper discusses experimental evaluations and simulations but does not provide specific details about the hardware used, such as GPU/CPU models, processors, or memory specifications. |
| Software Dependencies | No | The paper does not explicitly list any specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers with their versions) that would be needed to replicate the experiments. |
| Experiment Setup | Yes | We consider a horizon of T = 105 steps. We study the behaviour of the algorithm by varying the number of agents N and the number of arms K. We choose three pairs (N, K), which are (10, 100), (20, 100), (10, 200). We present the result in Fig. 2 for 30 independent runs for expected rewards drawn from uniform U(0, 1) distribution. ... We specifically consider Erd os-R enyi graphs G(N, p) where N >= 100 vertices are a swarm of N agents. Also, p = 10/N ln N/N is the edge selection probability. ... We again consider 3 cases of (N, K) which are (100, 250), (150, 250), and (100, 500). We present the result in Fig. 4 for 30 independent runs. |