reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

p-Mean Regret for Stochastic Bandits

Authors: Anand Krishna, Philips George John, Adarsh Barik, Vincent Y. F. Tan

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We have also performed some synthetic experiments to compare the p-mean-regrets achieved by EXPLORE-THEN-UCB versus NCB and UCB1 as baselines; these results can be found in the extended version (Krishna et al. 2024).
Researcher Affiliation	Academia	1 Department of Electrical and Computer Engineering, NUS 2 CNRS-CREATE & Department of Computer Science, NUS 3 Institute of Data Science, NUS 4 Department of Mathematics, NUS
Pseudocode	Yes	Algorithm 1 The EXPLORE-THEN-UCB Parameters: Time horizon T, number of arms k, exploration period T. for t 1, . . . , T do Uniformly sample it from [k]. Pull arm it and observe the reward Xt. Increment nit,t by one and update bµit,t for t T + 1, . . . , T do Let UCBi,t 1 bµi,t 1 + 4 q log T ni,t 1 . Select it arg maxi [k] UCBi,t 1. Pull arm it and observe the reward Xt. Update nit,t and bµit,t.
Open Source Code	Yes	Code https://github.com/philips-george/p-mean-regret-stochastic-bandits
Open Datasets	No	The paper does not explicitly refer to any named datasets or provide access information for a publicly available or open dataset. It discusses the stochastic Multi-Armed Bandit problem framework, which uses simulated or generated rewards, but no specific dataset is identified.
Dataset Splits	No	The paper does not use or refer to any specific publicly available dataset, therefore, no dataset split information is provided.
Hardware Specification	No	The paper mentions 'synthetic experiments' in the future work section, but does not provide any specific details about the hardware used for these experiments.
Software Dependencies	No	The paper describes a new algorithm and theoretical results but does not specify any software dependencies with version numbers.
Experiment Setup	No	The paper describes the parameters of its proposed algorithm (e.g., 'Time horizon T, number of arms k, exploration period T') but does not provide specific experimental setup details such as hyperparameters, learning rates, or batch sizes for actual experiments mentioned in the future work section.