Action Poisoning Attacks on Linear Contextual Bandits
Authors: Guanlin Liu, Lifeng Lai
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we provide numerical examples to illustrate the impact of proposed action poisoning attack schemes. We first empirically evaluate the performance of the proposed action poisoning attack schemes on three contextual bandit algorithms: Lin UCB (Abbasi-Yadkori et al., 2011), Lin TS (Agrawal & Goyal, 2013), and ϵ-Greedy. We run the experiments on three datasets: Synthetic data: The dimension of contexts and the coefficient vectors is d = 6. Jester dataset (Goldberg et al., 2001) Movie Lens 25M dataset: (Harper & Konstan, 2015) We set δ = 0.1 and λ = 2. For all the experiments, we set the total number of rounds T = 106 and the number of arms K = 10. We independently run ten repeated experiments. Results reported are averaged over the ten experiments. We set α to 0.2 for the two proposed attack strategies, hence the target arm may be the worst arm in some rounds. Each of the individual experimental runs costs up to 10 minutes on one physical CPU core. The type of CPU is Intel Core i7-8700. The results are shown in Table 1 and Figure 2. These experiments show that the action poisoning attacks can force the three agents to pull the target arm very frequently, while the agents rarely pull the target arm under no attack. |
| Researcher Affiliation | Academia | Guanlin Liu EMAIL Department of Electrical and Computer Engineering University of California, Davis Lifeng Lai EMAIL Department of Electrical and Computer Engineering University of California, Davis |
| Pseudocode | Yes | Algorithm 1 Action poisoning attacks on contextual linear bandit agent... Algorithm 2 Contextual Lin UCB (Li et al., 2010)... Algorithm 3 UCB-GLM (Li et al., 2017) |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating the release of source code for the methodology described. It discusses the methodology and presents numerical experiments without providing code access. |
| Open Datasets | Yes | We run the experiments on three datasets: Synthetic data: The dimension of contexts and the coefficient vectors is d = 6. We set the first entry of every context and coefficient vector to 1. The other entries of every context and coefficient vector are uniformly drawn from (-1/(d-1), 1/(d-1)). Thus, ||x||_2 <= 2 and mean rewards <x, θ> > 0. The reward noise ηt is drawn from a Gaussian distribution N(0, 0.01). Jester dataset (Goldberg et al., 2001): Jester contains 4.1 million ratings of jokes in which the rating values scale from -10.00 to +10.00. Movie Lens 25M dataset: (Harper & Konstan, 2015) Movie Lens 25M dataset contains 25 million 5-star ratings of 62,000 movies by 162,000 users. |
| Dataset Splits | No | The paper describes using synthetic data, Jester, and Movie Lens datasets for sequential contextual bandit experiments over a total number of rounds T=10^6. While it details data generation and preprocessing, it does not specify traditional train/test/validation splits, which are less common for sequential bandit problems that involve continuous interaction with an environment. |
| Hardware Specification | Yes | Each of the individual experimental runs costs up to 10 minutes on one physical CPU core. The type of CPU is Intel Core i7-8700. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We set δ = 0.1 and λ = 2. For all the experiments, we set the total number of rounds T = 106 and the number of arms K = 10. We independently run ten repeated experiments. Results reported are averaged over the ten experiments. We set α to 0.2 for the two proposed attack strategies, hence the target arm may be the worst arm in some rounds. |