reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Exploration, Exploitation, and Engagement in Multi-Armed Bandits with Abandonment

Authors: Zixian Yang, Xin Liu, Lei Ying

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulation results in Section 5 and Appendix D confirm our theoretical results and show that our algorithms have significantly lower regret than the traditional UCB and KL-UCB, and Q-learning-based algorithms.
Researcher Affiliation	Academia	Zixian Yang EMAIL Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI 48109, USA; Xin Liu EMAIL School of Information Science and Technology Shanghai Tech University Shanghai, China; Lei Ying EMAIL Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI 48109, USA
Pseudocode	Yes	Algorithm 1 ULCB Algorithm; Algorithm 2 CONT-ULCB Algorithm
Open Source Code	No	The paper does not explicitly state that source code for the proposed ULCB or KL-ULCB algorithms is provided, nor does it include any links to code repositories. It discusses comparisons to existing algorithms but not its own implementation availability.
Open Datasets	No	The paper presents "Simulation results" and defines the parameters for these simulations (e.g., M=2, µ(a1)=0.9, µ(a2)=0.8, abandonment probabilities). It does not use or provide access to any external, publicly available datasets in the traditional sense for its experimental evaluation.
Dataset Splits	No	The paper conducts simulations and mentions the number of episodes and independent runs ("2 * 10^4 episodes with 10^7 independent runs"). However, it does not involve traditional dataset splits (e.g., training, validation, test sets) as it is simulation-based rather than using pre-collected data.
Hardware Specification	No	The paper describes the simulation parameters and results but does not specify any hardware (e.g., GPU, CPU models, memory) used to run these simulations.
Software Dependencies	No	The paper mentions comparing with various algorithms like UCB, KL-UCB, Q-learning, and UCBVI, and details some of their parameters (e.g., ϵ=0.1 for Q-learning, learning rate, discount factor γ=1). However, it does not specify the software libraries, frameworks (e.g., Python, PyTorch, TensorFlow), or their version numbers used for the implementations of any algorithms discussed in the paper.
Experiment Setup	Yes	In the simulation, we assume Sk,1 1 for simplicity. This is to say that the user assumes a class of items are good if the user has not yet seen the items. Let M 2, µpa1q 0.9, and µpa2q 0.8. Note that for all the algorithms in the simulation, we do not include the log log terms (i.e., c 0) in the indices which are also omitted in (Garivier and Cappé, 2011). We simulated 2 * 10^4 episodes with 10^7 independent runs. We set c1 1, c0 1 for ULCB and c1 c0 1 for KL-ULCB. We also compare our algorithms with Q-learning (Watkins, 1989) with ϵ-greedy, Q-learning with UCB (Yang et al., 2021), and UCBVI (Azar et al., 2017). For Q-learning with ϵ-greedy, at each step, we select a random action with probability ϵ and select a greedy action according to the Q table with probability 1 ϵ. At each step, we update the Q table based on the update formula in (Watkins, 1989) with the discount factor γ 1. For Q-learning with UCB, at each step, we select a greedy action according to the estimated Q table. At each step, we update the Q table with an additional bonus term... For UCBVI, we also set the episode length parameter H to be the expected episode length under the best policy and the number of episodes K is set to be 1000. The probability parameter δ is set to be 0.001. The bonus term is modified to bps, aq 7H logp5SAHK{δq b 1 mint Npaq,Nrps,1q,Nrps,0qu... Simulations for the general-state setting can be found in Appendix C.5. Consider the MAB-A problem of the general-state setting. Let the abandonment probability function qp q be qpsq 1 logpc6s 1q ... for any s P r0, 1s, where c6 is a constant. Let the forgetting factor θ 0.5 in the simulation. ... Let n 4 for the discretization of DISC-ULCB and DISC-KL-ULCB.