Optimal Learning Policies for Differential Privacy in Multi-armed Bandits
Authors: Siwei Wang, Jun Zhu
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we mainly consider the differential privacy case, and compare our algorithms with state-of-the-art baselines, including DP-UCB, DP-UCB-BOUND, DP-UCB-INT in the work of Tossou and Dimitrakakis (2016) and DP-SE in the work of Sajed and Sheffet (2019). We consider two problem instances. In Instance 1 we set N = 9 and the expected reward vector is [0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7], in Instance 2 we set N = 101 and the expected reward of arm i [N] is µi = 0.3 + 0.004(i 1). In both instances, we choose T = 106, and all the results (the average cumulative regrets and the standard deviations of cumulative regrets) take an average over 100 independent runs. |
| Researcher Affiliation | Collaboration | Siwei Wang EMAIL Microsoft Research Asia Beijing, China Jun Zhu EMAIL Department of Computer Science and Technology, BNRist Center, Tsinghua AI Institute, Tsinghua Bosch Joint ML Center, Tsinghua University, Beijing, China |
| Pseudocode | Yes | Algorithm 1 DP-FTPL-Gauss Algorithm 2 DP-FTPL-Beta Algorithm 3 DP-FTPL-New Algorithm 4 GDP-Elim-New |
| Open Source Code | No | The paper does not provide any explicit statement about open-sourcing the code, a repository link, or mention of code in supplementary materials. |
| Open Datasets | No | We consider two problem instances. In Instance 1 we set N = 9 and the expected reward vector is [0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7], in Instance 2 we set N = 101 and the expected reward of arm i [N] is µi = 0.3 + 0.004(i 1). These problem instances are custom-defined within the paper's experimental section and are not external, publicly available datasets requiring a link or citation. |
| Dataset Splits | No | The paper describes problem instances for multi-armed bandits and reports results over "100 independent runs." This context does not involve traditional training/test/validation dataset splits used in other machine learning paradigms. |
| Hardware Specification | No | The paper mentions that "This work was supported by...the High Performance Computing Center, Tsinghua University." However, it does not provide specific details about the hardware used, such as GPU/CPU models, memory, or other configurations. |
| Software Dependencies | No | The paper does not provide specific software dependencies, such as programming languages, libraries, or frameworks with their version numbers. |
| Experiment Setup | Yes | In both instances, we choose T = 106, and all the results (the average cumulative regrets and the standard deviations of cumulative regrets) take an average over 100 independent runs. We use Instance 1 and set (ϵ, δ) to be (1, 0.01) or (1, e 10). In Fig. 4(c) and Fig. 4(d), we use Instance 2 and set (ϵ, δ) to be (1, 0.01) or (1, e 10). Algorithm 1 DP-FTPL-Gauss: for each arm i, set Ni = 0, Ri = 0. while Ni < N G min{ 1 4πδ2 , 1 ϵ2 log( e 4πδ2 )} do Pull arm i(t) = i and observe reward x(t) = xi(t) {0, 1}. Ri Ri + xi(t), Ni Ni + 1. end while |