reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Satisficing Regret Minimization in Bandits

Authors: Qing Feng, Tianyi Ma, Ruihao Zhu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, in Section 6 we conduct numerical experiments to demonstrate the performance of SELECT for several popular bandit optimization settings. ... Our results reveal that in the realizable case, SELECT does achieve constant satisficing regret.
Researcher Affiliation	Academia	Qing Feng Cornell University EMAIL Tianyi Ma Shanghai Jiao Tong University sean EMAIL Ruihao Zhu Cornell University EMAIL
Pseudocode	Yes	The pseudo-code of SELECT is presented in Algorithm 1.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to repositories.
Open Datasets	No	The paper describes creating specific problem instances for experiments (e.g., 'an instance of 4 arms, and the expected reward of all arms are {0.6, 0.7, 0.8, 1}' for finite-armed bandits, or 'concave reward function r(x) = 1 - 16(x - 0.25)^2' for concave bandits, and 'two-dimensional Lipschitz bandit in domain (x, y) [0, 1]^2'). These are custom-generated experimental setups, not references to external, publicly available datasets.
Dataset Splits	No	The paper describes experiments in simulated bandit environments over a 'time horizon' rather than using traditional datasets with training/test/validation splits. Therefore, the concept of dataset splits is not applicable in the typical sense for this research.
Hardware Specification	Yes	All experiments are run on a PC with 4-core CPU and 16GB of RAM.
Software Dependencies	No	The paper mentions various algorithms and benchmarks (e.g., Thompson sampling, SAT-UCB, convex bandit algorithm, uniformly discretized UCB), but does not provide specific software names with version numbers for any libraries, frameworks, or operating systems used in implementation.
Experiment Setup	Yes	Setup: In this case, we consider an instance of 4 arms, and the expected reward of all arms are {0.6, 0.7, 0.8, 1}. The length of the time horizon is set to be 500 to 5000 with a stepsize of 500. We conduct experiments for both the realizable case and the non-realizable case. For the realizable case, we set the satisficing level S = 0.93; for the non-realizable case, we set the satisficing level S = 1.5. In both cases, we compare SELECT against Thompson sampling in Agrawal & Goyal (2017) as well as SAT-UCB and SAT-UCB+ in Michel et al. (2023). For SELECT, we use Thompson Sampling as the learning oracle. The experiment is repeated for 1000 times and we report the average results.