Switching Latent Bandits

Authors: Alessio Russo, Alberto Maria Metelli, Marcello Restelli

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, Section 7 shows numerical simulations on synthetic and semi-synthetic data. We provide additional experiments that highlight the difference in performance between our estimation procedure and a technique based on SD approaches.
Researcher Affiliation Academia Alessio Russo EMAIL Department of Electronics, Information and Bioengineering Politecnico di Milano
Pseudocode Yes Algorithm 1: Estimation Procedure Input: Action Observation matrix O, number of rounds N Algorithm 2: SL-EC Algorithm Input: Observation model O, Exploration horizon T0, Total horizon T
Open Source Code No The paper does not contain any explicit statement about providing source code or a link to a code repository for the described methodology.
Open Datasets Yes In this section, we provide numerical simulations on synthetic and semi-synthetic data based on the Movie Lens 1M (Harper & Konstan, 2015) dataset, demonstrating the effectiveness of the proposed Markov chain estimation procedure.
Dataset Splits Yes From the obtained matrix, we select 70% of all ratings as a training dataset and use the remaining 30% as a test set.
Hardware Specification Yes Experiments involving a higher number of states instead were not able to reach convergence with a number of samples of the order 10^5 and, by trying to increase this quantity, there were memory space problems with the used hardware (Intel i7-11th and 16G RAM).
Software Dependencies No The paper does not explicitly mention any specific software dependencies with version numbers.
Experiment Setup Yes The parameters for all the baseline algorithms have been properly tuned according to the settings considered. For the specific experiments considered, we adopted scaled values for the exploration horizon T0 w.r.t. the result derived from the theory.