reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Action Poisoning Attacks on Linear Contextual Bandits

Authors: Guanlin Liu, Lifeng Lai

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we provide numerical examples to illustrate the impact of proposed action poisoning attack schemes. We first empirically evaluate the performance of the proposed action poisoning attack schemes on three contextual bandit algorithms: Lin UCB (Abbasi-Yadkori et al., 2011), Lin TS (Agrawal & Goyal, 2013), and ϵ-Greedy. We run the experiments on three datasets: Synthetic data: The dimension of contexts and the coefficient vectors is d = 6. Jester dataset (Goldberg et al., 2001) Movie Lens 25M dataset: (Harper & Konstan, 2015) We set δ = 0.1 and λ = 2. For all the experiments, we set the total number of rounds T = 106 and the number of arms K = 10. We independently run ten repeated experiments. Results reported are averaged over the ten experiments. We set α to 0.2 for the two proposed attack strategies, hence the target arm may be the worst arm in some rounds. Each of the individual experimental runs costs up to 10 minutes on one physical CPU core. The type of CPU is Intel Core i7-8700. The results are shown in Table 1 and Figure 2. These experiments show that the action poisoning attacks can force the three agents to pull the target arm very frequently, while the agents rarely pull the target arm under no attack.
Researcher Affiliation	Academia	Guanlin Liu EMAIL Department of Electrical and Computer Engineering University of California, Davis Lifeng Lai EMAIL Department of Electrical and Computer Engineering University of California, Davis
Pseudocode	Yes	Algorithm 1 Action poisoning attacks on contextual linear bandit agent... Algorithm 2 Contextual Lin UCB (Li et al., 2010)... Algorithm 3 UCB-GLM (Li et al., 2017)
Open Source Code	No	The paper does not contain any explicit statements or links indicating the release of source code for the methodology described. It discusses the methodology and presents numerical experiments without providing code access.
Open Datasets	Yes	We run the experiments on three datasets: Synthetic data: The dimension of contexts and the coefficient vectors is d = 6. We set the first entry of every context and coefficient vector to 1. The other entries of every context and coefficient vector are uniformly drawn from (-1/(d-1), 1/(d-1)). Thus, \|\|x\|\|_2 <= 2 and mean rewards <x, θ> > 0. The reward noise ηt is drawn from a Gaussian distribution N(0, 0.01). Jester dataset (Goldberg et al., 2001): Jester contains 4.1 million ratings of jokes in which the rating values scale from -10.00 to +10.00. Movie Lens 25M dataset: (Harper & Konstan, 2015) Movie Lens 25M dataset contains 25 million 5-star ratings of 62,000 movies by 162,000 users.
Dataset Splits	No	The paper describes using synthetic data, Jester, and Movie Lens datasets for sequential contextual bandit experiments over a total number of rounds T=10^6. While it details data generation and preprocessing, it does not specify traditional train/test/validation splits, which are less common for sequential bandit problems that involve continuous interaction with an environment.
Hardware Specification	Yes	Each of the individual experimental runs costs up to 10 minutes on one physical CPU core. The type of CPU is Intel Core i7-8700.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We set δ = 0.1 and λ = 2. For all the experiments, we set the total number of rounds T = 106 and the number of arms K = 10. We independently run ten repeated experiments. Results reported are averaged over the ten experiments. We set α to 0.2 for the two proposed attack strategies, hence the target arm may be the worst arm in some rounds.