Stochastic Bandits Robust to Adversarial Attacks

Authors: Xuchuang Wang, Maoli Liu, Jinhang Zuo, Xutong Liu, John C.S. Lui, Mohammad Hajiesmaili

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiment. Besides the above analytical results, we also conduct experiments to validate the performance of our algorithms. The attacker employs the attack strategy proposed in Jun et al. (2018) when there are remaining attack budgets. We compare the regrets of our algorithms with prior algorithms under different attack budgets. The results show that our algorithms outperform the state-of-the-art algorithms in most cases, especially when the attack budget is large. We provide more details in 6. [...] Figures 3 and 4 show the regret of algorithms under known and unknown attack budgets.
Researcher Affiliation Academia Xuchuang Wang CICS, UMass Amherst Maoli Liu CSE, CUHK Jinhang Zuo CS, City U Xutong Liu ECE, CMU John C.S. Lui CSE, CUHK Mohammad Hajiesmaili CICS, UMass Amherst
Pseudocode Yes Algorithm 2 SE-WR: Successive Elimination with Wide Confidence Radius [...] Algorithm 3 PE-WR: Phase Elimination with Wide Confidence Radius [...] Algorithm 4 MS-SE-WR: Model Selection with Wide Confidence Radius [...] Algorithm 6 SE-WR-Stop: Successive Elimination with Wide Confidence Radius and Stop Condition
Open Source Code No The paper does not contain any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets No We consider a stochastic bandit with K = 10 arms, where each arm k follows a Bernoulli distribution with means µk decreasing from 0.9 to 0.45 in steps of 0.05. We evaluate all algorithms under four attack budgets (C {0, 3000, 6000, 9000}) over T = 200000 rounds, repeating each experiment 10 times and reporting the average and deviation of cumulative regret. Additional results can be found in Appendix G.
Dataset Splits No The paper describes a simulated environment with specific parameters (K=10 arms, Bernoulli distributions for rewards, T=200000 rounds) and experimental repetitions (10 times) rather than using a pre-existing dataset with defined splits for training, validation, or testing.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU/CPU models, memory, or cloud computing resources.
Software Dependencies No The paper does not mention any specific software dependencies with version numbers that would be required to replicate the experiments.
Experiment Setup Yes We consider a stochastic bandit with K = 10 arms, where each arm k follows a Bernoulli distribution with means µk decreasing from 0.9 to 0.45 in steps of 0.05. We evaluate all algorithms under four attack budgets (C {0, 3000, 6000, 9000}) over T = 200000 rounds, repeating each experiment 10 times and reporting the average and deviation of cumulative regret. [...] We adopt the attack strategy from Jun et al. (2018) (details in Appendix G.1), a standard attack policy for stochastic bandits. [...] We set 0 = 0.1, σ = 1, and δ = 0.05 in the attack model.