Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits

Authors: Junpei Komiyama, Edouard Fouché, Junya Honda

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that the proposed algorithms outperform the existing approaches in synthetic and real-world environments. Keywords: multi-armed bandits, adaptive windows, nonstationary bandits, changepoint detection, sequential learning. ... 6. Experiments This section reports the empirical performance of our proposed algorithms in simulations. We release the source code to reproduce our experiments on Git Hub13.
Researcher Affiliation Academia Junpei Komiyama EMAIL Stern School of Business, New York University, USA Edouard Fouch e EMAIL Institute for Program Structures and Data Organization, Karlsruhe Institute of Technology, Germany Junya Honda EMAIL Department of Systems Science, Graduate School of Informatics, Kyoto University, Japan Center for Advanced Intelligence Project, RIKEN, Japan
Pseudocode Yes Algorithm 1 ADWIN Require: A univariate stream of values S : (x1, x2, . . . ) [0, 1], confidence level δ (0, 1). ... Algorithm 2 Thompson sampling (TS) Require: Set of arms [K]. ... Algorithm 3 Kullback Leibler UCB (KL-UCB). Require: Set of arms [K]. ... Algorithm 4 ADS-bandit Require: Set of arms [K], confidence level δ, base-bandit algorithm ... Algorithm 5 ADR-bandit Require: Set of arms [K], confidence level δ, monitoring parameter N N, base-bandit algorithm
Open Source Code Yes This section reports the empirical performance of our proposed algorithms in simulations. We release the source code to reproduce our experiments on Git Hub13. ... 13 https://github.com/edouardfouche/G-NS-MAB
Open Datasets Yes Bioliq: This dataset was released by Fouch e et al. (2019). ... Zozo (Saito et al., 2021) is a real-world environment where the rewards are generated from an ad recommender on an e-commerce website.
Dataset Splits No The paper describes how synthetic and real-world environments generate rewards over time for bandit problems, but it does not specify explicit training/test/validation dataset splits typically used in supervised learning contexts.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models.
Software Dependencies No The paper mentions various algorithms and methods like ADWIN2, TS, KL-UCB, RExp3, etc., but it does not list any specific software libraries or programming languages with their version numbers.
Experiment Setup Yes Algorithmic hyperparameters are described in Section A. ... Table 2: List of tested hyperparameters of the algorithms. Bold letters indicate the ones reported in this paper. We set the confidence level of UCBL-CPD and Imp CPD to be the same as ADR-TS. Algorithm(s) Hyperparameters ADS-TS and ADR-TS, ADR-KL-UCB δ = 10-1, 10-2, 10-3, 10-4, 10-5, 10-6, 10-8, 10-12, 10-15 RExp3 T = 100, 500, 1000, 5000 SW-TS W = 100, 500, 1000, 5000