Sliding-Window Thompson Sampling for Non-Stationary Settings

Authors: Francesco Trovo, Stefano Paladino, Marcello Restelli, Nicola Gatti

JAIR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Furthermore, we empirically show that SW-TS dramatically outperforms state-of-the-art algorithms even when the forms of non-stationarity are taken separately, as previously studied in the literature.
Researcher Affiliation Academia Politecnico di Milano, Dipartimento di Elettronica, Informazione e Bioingegneria, Piazza Leonardo da Vinci 32, Milano, 20133, Italy
Pseudocode Yes The pseudocode of SW-TS for Bernoulli distributed rewards is presented in Algorithm 1.
Open Source Code No The paper does not contain any explicit statements about releasing the source code, nor does it provide links to a code repository.
Open Datasets No The paper describes generating synthetic data for its experiments: "The expected value µi,φ is chosen randomly for every arm ai during each phase." (Section 5.1), "the expected value µi,t of arm ai changes according to the following function: w(t) = 1 + (K − 1)(1 + sin(tσ))" (Section 5.2). There is no mention of using a publicly available dataset or providing access to the generated datasets.
Dataset Splits No The paper describes how the *time horizon* for the multi-armed bandit problem is structured (e.g., "We split the time horizon N into four phases of equal length"). However, it does not provide traditional training/test/validation dataset splits, as experiments are conducted sequentially on a dynamically changing environment rather than on a pre-partitioned static dataset.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models, processor types, or memory amounts.
Software Dependencies No The paper does not specify any software dependencies with version numbers, such as programming languages or libraries.
Experiment Setup Yes We use a time horizon N {104, 105, 106} and a number of arms K {5, 10, 20, 30}. We split the time horizon N into four phases of equal length. The expected value µi,φ is chosen randomly for every arm ai during each phase. In particular, after every breakpoint, the expected value µi,φ of each arm ai is drawn from a uniform probability distribution over [0, 1]... For the sake of comparison, we choose a sliding window τ = 4 p N log(N) as is discussed by Garivier and Moulines (2008). We generate 10 configurations for every combination of N and K as discussed above, and we provide the results averaged over the configurations and over 100 independent trials for each configuration. ... The expected value µi,t of arm ai changes according to the following function: w(t) = 1 + (K − 1)(1 + sin(tσ)). ... with σ = 0.0001. ... Thus, we set τ = σ^(4/5).