Stochastic Bandits Robust to Adversarial Attacks
Authors: Xuchuang Wang, Maoli Liu, Jinhang Zuo, Xutong Liu, John C.S. Lui, Mohammad Hajiesmaili
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiment. Besides the above analytical results, we also conduct experiments to validate the performance of our algorithms. The attacker employs the attack strategy proposed in Jun et al. (2018) when there are remaining attack budgets. We compare the regrets of our algorithms with prior algorithms under different attack budgets. The results show that our algorithms outperform the state-of-the-art algorithms in most cases, especially when the attack budget is large. We provide more details in 6. [...] Figures 3 and 4 show the regret of algorithms under known and unknown attack budgets. |
| Researcher Affiliation | Academia | Xuchuang Wang CICS, UMass Amherst Maoli Liu CSE, CUHK Jinhang Zuo CS, City U Xutong Liu ECE, CMU John C.S. Lui CSE, CUHK Mohammad Hajiesmaili CICS, UMass Amherst |
| Pseudocode | Yes | Algorithm 2 SE-WR: Successive Elimination with Wide Confidence Radius [...] Algorithm 3 PE-WR: Phase Elimination with Wide Confidence Radius [...] Algorithm 4 MS-SE-WR: Model Selection with Wide Confidence Radius [...] Algorithm 6 SE-WR-Stop: Successive Elimination with Wide Confidence Radius and Stop Condition |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or provide a link to a code repository. |
| Open Datasets | No | We consider a stochastic bandit with K = 10 arms, where each arm k follows a Bernoulli distribution with means µk decreasing from 0.9 to 0.45 in steps of 0.05. We evaluate all algorithms under four attack budgets (C {0, 3000, 6000, 9000}) over T = 200000 rounds, repeating each experiment 10 times and reporting the average and deviation of cumulative regret. Additional results can be found in Appendix G. |
| Dataset Splits | No | The paper describes a simulated environment with specific parameters (K=10 arms, Bernoulli distributions for rewards, T=200000 rounds) and experimental repetitions (10 times) rather than using a pre-existing dataset with defined splits for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU/CPU models, memory, or cloud computing resources. |
| Software Dependencies | No | The paper does not mention any specific software dependencies with version numbers that would be required to replicate the experiments. |
| Experiment Setup | Yes | We consider a stochastic bandit with K = 10 arms, where each arm k follows a Bernoulli distribution with means µk decreasing from 0.9 to 0.45 in steps of 0.05. We evaluate all algorithms under four attack budgets (C {0, 3000, 6000, 9000}) over T = 200000 rounds, repeating each experiment 10 times and reporting the average and deviation of cumulative regret. [...] We adopt the attack strategy from Jun et al. (2018) (details in Appendix G.1), a standard attack policy for stochastic bandits. [...] We set 0 = 0.1, σ = 1, and δ = 0.05 in the attack model. |