reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimal Learning Policies for Differential Privacy in Multi-armed Bandits

Authors: Siwei Wang, Jun Zhu

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we mainly consider the diﬀerential privacy case, and compare our algorithms with state-of-the-art baselines, including DP-UCB, DP-UCB-BOUND, DP-UCB-INT in the work of Tossou and Dimitrakakis (2016) and DP-SE in the work of Sajed and Sheﬀet (2019). We consider two problem instances. In Instance 1 we set N = 9 and the expected reward vector is [0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7], in Instance 2 we set N = 101 and the expected reward of arm i [N] is µi = 0.3 + 0.004(i 1). In both instances, we choose T = 106, and all the results (the average cumulative regrets and the standard deviations of cumulative regrets) take an average over 100 independent runs.
Researcher Affiliation	Collaboration	Siwei Wang EMAIL Microsoft Research Asia Beijing, China Jun Zhu EMAIL Department of Computer Science and Technology, BNRist Center, Tsinghua AI Institute, Tsinghua Bosch Joint ML Center, Tsinghua University, Beijing, China
Pseudocode	Yes	Algorithm 1 DP-FTPL-Gauss Algorithm 2 DP-FTPL-Beta Algorithm 3 DP-FTPL-New Algorithm 4 GDP-Elim-New
Open Source Code	No	The paper does not provide any explicit statement about open-sourcing the code, a repository link, or mention of code in supplementary materials.
Open Datasets	No	We consider two problem instances. In Instance 1 we set N = 9 and the expected reward vector is [0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7], in Instance 2 we set N = 101 and the expected reward of arm i [N] is µi = 0.3 + 0.004(i 1). These problem instances are custom-defined within the paper's experimental section and are not external, publicly available datasets requiring a link or citation.
Dataset Splits	No	The paper describes problem instances for multi-armed bandits and reports results over "100 independent runs." This context does not involve traditional training/test/validation dataset splits used in other machine learning paradigms.
Hardware Specification	No	The paper mentions that "This work was supported by...the High Performance Computing Center, Tsinghua University." However, it does not provide specific details about the hardware used, such as GPU/CPU models, memory, or other configurations.
Software Dependencies	No	The paper does not provide specific software dependencies, such as programming languages, libraries, or frameworks with their version numbers.
Experiment Setup	Yes	In both instances, we choose T = 106, and all the results (the average cumulative regrets and the standard deviations of cumulative regrets) take an average over 100 independent runs. We use Instance 1 and set (ϵ, δ) to be (1, 0.01) or (1, e 10). In Fig. 4(c) and Fig. 4(d), we use Instance 2 and set (ϵ, δ) to be (1, 0.01) or (1, e 10). Algorithm 1 DP-FTPL-Gauss: for each arm i, set Ni = 0, Ri = 0. while Ni < N G min{ 1 4πδ2 , 1 ϵ2 log( e 4πδ2 )} do Pull arm i(t) = i and observe reward x(t) = xi(t) {0, 1}. Ri Ri + xi(t), Ni Ni + 1. end while