reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Regret Bounds for Satisficing in Multi-Armed Bandit Problems

Authors: Thomas Michel, Hossein Hajiabolhassan, Ronald Ortner

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compared Sat-UCB to other bandit algorithms in order to show that the latter keep accumulating S-regret, while Sat-UCB sticks to a satisficing arm after finite time, thus confirming the results of Theorem 2. We also investigated the behavior of Sat-UCB in the not realizable case with different values for the chosen satisfaction level S and did experiments with a slightly modified version Sat-UCB+ introduced below in Section 4.2.
Researcher Affiliation	Academia	Thomas Michel EMAIL Université Paris-Saclay ENS Paris-Saclay France Hossein Hajiabolhassan EMAIL Institute of Human Genetics Diagnostic and Research Center for Molecular Biomedicine Medical University of Graz Austria Ronald Ortner EMAIL Lehrstuhl für Informationstechnologie Montanuniversität Leoben Austria
Pseudocode	Yes	Algorithm 1: Simple-Sat (Simple Algorithm for Satisficing in the Realizable Case) Algorithm 2: Sat-UCB Scheme for Satisficing in the General Case Algorithm 3: Sat-UCB+ (Experimental Simplification of Sat-UCB)
Open Source Code	No	The paper does not contain any explicit statement about providing open-source code, nor does it provide links to any code repositories.
Open Datasets	No	The paper describes generating data for experiments using "normally distributed rewards with standard deviation 1" in specific settings. It does not refer to any established public datasets or provide access information for its generated data.
Dataset Splits	No	The paper describes a multi-armed bandit problem, which typically involves online learning and simulation rather than static dataset splits. It specifies how reward means are set for arms (e.g., "mean reward of each arm i = 1, 2, . . . , 20 is set to i 1/20") but does not discuss training, validation, or test splits of a fixed dataset.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to conduct the experiments.
Software Dependencies	No	The paper mentions various algorithms (e.g., UCB1, UCBalpha, UCL) but does not specify any software libraries or programming languages with version numbers used for their implementation.
Experiment Setup	Yes	To illustrate the influence of the structure of the underlying bandit problem we performed experiments in the following two settings each with 20 arms and normally distributed rewards with standard deviation 1: In Setting 1 the mean reward of each arm i = 1, 2, . . . , 20 is set to i 1/20 . For the satisfaction level we chose 0.8 in the realizable case, resulting in four satisfying arms. For experiments in the not realizable case S = 1 was chosen. In Setting 2 the mean reward of each arm i is set to q i/20. For the satisfaction level we chose S = 0.92 so that there are three satisfying arms in the realizable case. This setting is more difficult than Setting 1, as arms are closer to S as well as to each other. The not realizable satisfaction level was set to 1.1.