reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Sliding-Window Thompson Sampling for Non-Stationary Settings

Authors: Francesco Trovo, Stefano Paladino, Marcello Restelli, Nicola Gatti

JAIR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Furthermore, we empirically show that SW-TS dramatically outperforms state-of-the-art algorithms even when the forms of non-stationarity are taken separately, as previously studied in the literature.
Researcher Affiliation	Academia	Politecnico di Milano, Dipartimento di Elettronica, Informazione e Bioingegneria, Piazza Leonardo da Vinci 32, Milano, 20133, Italy
Pseudocode	Yes	The pseudocode of SW-TS for Bernoulli distributed rewards is presented in Algorithm 1.
Open Source Code	No	The paper does not contain any explicit statements about releasing the source code, nor does it provide links to a code repository.
Open Datasets	No	The paper describes generating synthetic data for its experiments: "The expected value µi,φ is chosen randomly for every arm ai during each phase." (Section 5.1), "the expected value µi,t of arm ai changes according to the following function: w(t) = 1 + (K − 1)(1 + sin(tσ))" (Section 5.2). There is no mention of using a publicly available dataset or providing access to the generated datasets.
Dataset Splits	No	The paper describes how the time horizon for the multi-armed bandit problem is structured (e.g., "We split the time horizon N into four phases of equal length"). However, it does not provide traditional training/test/validation dataset splits, as experiments are conducted sequentially on a dynamically changing environment rather than on a pre-partitioned static dataset.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models, processor types, or memory amounts.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers, such as programming languages or libraries.
Experiment Setup	Yes	We use a time horizon N {104, 105, 106} and a number of arms K {5, 10, 20, 30}. We split the time horizon N into four phases of equal length. The expected value µi,φ is chosen randomly for every arm ai during each phase. In particular, after every breakpoint, the expected value µi,φ of each arm ai is drawn from a uniform probability distribution over [0, 1]... For the sake of comparison, we choose a sliding window τ = 4 p N log(N) as is discussed by Garivier and Moulines (2008). We generate 10 conﬁgurations for every combination of N and K as discussed above, and we provide the results averaged over the conﬁgurations and over 100 independent trials for each conﬁguration. ... The expected value µi,t of arm ai changes according to the following function: w(t) = 1 + (K − 1)(1 + sin(tσ)). ... with σ = 0.0001. ... Thus, we set τ = σ^(4/5).