reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Stochastic Bandits Robust to Adversarial Attacks

Authors: Xuchuang Wang, Maoli Liu, Jinhang Zuo, Xutong Liu, John C.S. Lui, Mohammad Hajiesmaili

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiment. Besides the above analytical results, we also conduct experiments to validate the performance of our algorithms. The attacker employs the attack strategy proposed in Jun et al. (2018) when there are remaining attack budgets. We compare the regrets of our algorithms with prior algorithms under different attack budgets. The results show that our algorithms outperform the state-of-the-art algorithms in most cases, especially when the attack budget is large. We provide more details in 6. [...] Figures 3 and 4 show the regret of algorithms under known and unknown attack budgets.
Researcher Affiliation	Academia	Xuchuang Wang CICS, UMass Amherst Maoli Liu CSE, CUHK Jinhang Zuo CS, City U Xutong Liu ECE, CMU John C.S. Lui CSE, CUHK Mohammad Hajiesmaili CICS, UMass Amherst
Pseudocode	Yes	Algorithm 2 SE-WR: Successive Elimination with Wide Confidence Radius [...] Algorithm 3 PE-WR: Phase Elimination with Wide Confidence Radius [...] Algorithm 4 MS-SE-WR: Model Selection with Wide Confidence Radius [...] Algorithm 6 SE-WR-Stop: Successive Elimination with Wide Confidence Radius and Stop Condition
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets	No	We consider a stochastic bandit with K = 10 arms, where each arm k follows a Bernoulli distribution with means µk decreasing from 0.9 to 0.45 in steps of 0.05. We evaluate all algorithms under four attack budgets (C {0, 3000, 6000, 9000}) over T = 200000 rounds, repeating each experiment 10 times and reporting the average and deviation of cumulative regret. Additional results can be found in Appendix G.
Dataset Splits	No	The paper describes a simulated environment with specific parameters (K=10 arms, Bernoulli distributions for rewards, T=200000 rounds) and experimental repetitions (10 times) rather than using a pre-existing dataset with defined splits for training, validation, or testing.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as GPU/CPU models, memory, or cloud computing resources.
Software Dependencies	No	The paper does not mention any specific software dependencies with version numbers that would be required to replicate the experiments.
Experiment Setup	Yes	We consider a stochastic bandit with K = 10 arms, where each arm k follows a Bernoulli distribution with means µk decreasing from 0.9 to 0.45 in steps of 0.05. We evaluate all algorithms under four attack budgets (C {0, 3000, 6000, 9000}) over T = 200000 rounds, repeating each experiment 10 times and reporting the average and deviation of cumulative regret. [...] We adopt the attack strategy from Jun et al. (2018) (details in Appendix G.1), a standard attack policy for stochastic bandits. [...] We set 0 = 0.1, σ = 1, and δ = 0.05 in the attack model.