Multi-Armed Bandits with Interference: Bridging Causal Inference and Adversarial Bandits
Authors: Su Jia, Peter I. Frazier, Nathan Kallus
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We visualize the results in Figures 4 and 5. Consistent with the theoretical analysis, CR outperforms SB in terms of 95% percentile regret, without sacrificing mean regret. Moreover, we observe that when N = T 3, the regret exhibits a smaller deviation. Finally, we observe that when N = T 3, the 95 regret percentile of CR is lower compared to the N = T 2 case. This is reasonable since a larger N helps reduce the variance in the reward estimation. |
| Researcher Affiliation | Academia | 1Center for Data Science for Enterprise and Society, Cornell University 2School of Operations Research and Information Engineering, Cornell University. |
| Pseudocode | Yes | Algorithm 1 EXP3-HT-IX Policy 1: Input: η (0, 1): learning rate, β [0, 1/2): IX parameter for the HT-IX estimator, (ℓ, r): parameters for the RRP. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or provide links to a code repository. |
| Open Datasets | No | We consider a 2-armed setting with N units lying on a N lattice. We generate unit-level interference as follows. Each unit u [N] is assigned a random reward Rut. |
| Dataset Splits | No | The paper describes generating synthetic data and running experiments on 'instances' but does not provide specific training/test/validation dataset splits for a pre-existing dataset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The paper mentions comparing 'EXP3-IX' and 'EXP3-IX-HT' policies, but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, specific libraries). |
| Experiment Setup | Yes | We consider a 2-armed setting with N units lying on a N lattice. We generate unit-level interference as follows. Each unit u [N] is assigned a random reward Rut. Let ρut be the proportion of the five immediate neighbors (counting u itself) assigned arm 1 at time t. Then, the reward at u is (2ρut 1)cut. In two sets of experiments, we assume that N = T 2 and N = T 3 respectively, and let T range from 10, 20, . . . , 50. For each fixed N, T, we randomly generate 100 instances. To necessitate exploration, we add large-scale non-stationarity by randomly generating drifts. Each drift is an 8-piecewise constant function, where the value on each piece is independently drawn from U(0, 1). To align with the theoretical analysis, we partition the lattice into square-shaped clusters of side length N 1/4. For simplicity, we perform a simplified version of the clustering without randomly assigning the boundary units to nearby clusters. We compare the switchback version of EXP3-IX (denoted SB) and our clustered randomization-based policy, dubbed EXP3-IX-HT, and denoted CR in the figures. We evaluated the performance of the two policies by running each of them 200 times for each instance. |