The Bandit Whisperer: Communication Learning for Restless Bandits

Authors: Yunfan Zhao, Tonghan Wang, Dheeraj Mysore Nagaraj, Aparna Taneja, Milind Tambe

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Both theoretical and empirical evidence validate the effectiveness of our method in significantly improving RMAB performance across diverse problems. Empirically, we validate our method on synthetic problems, as well as the ARMMAN maternal healthcare (Verma et al. 2024) and the SIS epidemic intervention problem (Yaesoubi and Cohen 2011) built upon real-world data. Results show that our method significantly outperforms both the non-communicative learning approach and the approach with fixed communication strategies, achieving performance levels comparable to those learning in noise-free environments. Figure 1: Performance (interquartile mean (Agarwal et al. 2021) and standard error of return over 200 random seeds) of our method, baselines, and ablations in three environments with different numbers of arms N and resource budgets B.
Researcher Affiliation Collaboration Yunfan Zhao1, 2*Tonghan Wang1*, Dheeraj Mysore Nagaraj3, Aparna Taneja3, Milind Tambe1, 3 1Harvard University, 2GE Healthcare, 3Google Deepmind
Pseudocode No The paper describes algorithms using mathematical equations and textual explanations, but it does not contain a dedicated section labeled "Pseudocode" or "Algorithm," nor does it present structured code-like blocks.
Open Source Code No The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository or mention code in supplementary materials.
Open Datasets Yes SIS Epidemic Model (Yaesoubi and Cohen 2011; Killian et al. 2022; Zhao et al. 2024a): Each arm p represents a subpopulation in a geographic region, with the number of uninfected individuals being the state sp. ARMMAN: The dataset is collected by ARMMAN (ARMMAN 2019), an NGO in India aimed at improving health awareness for a million expectant and new mothers via automated voice messaging programs. The transition dynamics is established empirically from the data (data usage and consent are in Appendix A.2).
Dataset Splits No The paper states, "In all domains, 80% of the arms are noisy," but it does not provide specific training, validation, or test dataset splits (e.g., percentages, sample counts, or references to standard splits) for the data used to train the models.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or other computing specifications used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers).
Experiment Setup No The paper mentions "We discuss additional experimental details and hyperparameters in Appendix B." However, specific hyperparameter values or training configurations are not detailed within the main body of the paper.