Optimal Learning Policies for Differential Privacy in Multi-armed Bandits

Authors: Siwei Wang, Jun Zhu

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we mainly consider the differential privacy case, and compare our algorithms with state-of-the-art baselines, including DP-UCB, DP-UCB-BOUND, DP-UCB-INT in the work of Tossou and Dimitrakakis (2016) and DP-SE in the work of Sajed and Sheffet (2019). We consider two problem instances. In Instance 1 we set N = 9 and the expected reward vector is [0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7], in Instance 2 we set N = 101 and the expected reward of arm i [N] is µi = 0.3 + 0.004(i 1). In both instances, we choose T = 106, and all the results (the average cumulative regrets and the standard deviations of cumulative regrets) take an average over 100 independent runs.
Researcher Affiliation Collaboration Siwei Wang EMAIL Microsoft Research Asia Beijing, China Jun Zhu EMAIL Department of Computer Science and Technology, BNRist Center, Tsinghua AI Institute, Tsinghua Bosch Joint ML Center, Tsinghua University, Beijing, China
Pseudocode Yes Algorithm 1 DP-FTPL-Gauss Algorithm 2 DP-FTPL-Beta Algorithm 3 DP-FTPL-New Algorithm 4 GDP-Elim-New
Open Source Code No The paper does not provide any explicit statement about open-sourcing the code, a repository link, or mention of code in supplementary materials.
Open Datasets No We consider two problem instances. In Instance 1 we set N = 9 and the expected reward vector is [0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7], in Instance 2 we set N = 101 and the expected reward of arm i [N] is µi = 0.3 + 0.004(i 1). These problem instances are custom-defined within the paper's experimental section and are not external, publicly available datasets requiring a link or citation.
Dataset Splits No The paper describes problem instances for multi-armed bandits and reports results over "100 independent runs." This context does not involve traditional training/test/validation dataset splits used in other machine learning paradigms.
Hardware Specification No The paper mentions that "This work was supported by...the High Performance Computing Center, Tsinghua University." However, it does not provide specific details about the hardware used, such as GPU/CPU models, memory, or other configurations.
Software Dependencies No The paper does not provide specific software dependencies, such as programming languages, libraries, or frameworks with their version numbers.
Experiment Setup Yes In both instances, we choose T = 106, and all the results (the average cumulative regrets and the standard deviations of cumulative regrets) take an average over 100 independent runs. We use Instance 1 and set (ϵ, δ) to be (1, 0.01) or (1, e 10). In Fig. 4(c) and Fig. 4(d), we use Instance 2 and set (ϵ, δ) to be (1, 0.01) or (1, e 10). Algorithm 1 DP-FTPL-Gauss: for each arm i, set Ni = 0, Ri = 0. while Ni < N G min{ 1 4πδ2 , 1 ϵ2 log( e 4πδ2 )} do Pull arm i(t) = i and observe reward x(t) = xi(t) {0, 1}. Ri Ri + xi(t), Ni Ni + 1. end while