Thompson Sampling Algorithms for Cascading Bandits

Authors: Zixin Zhong, Wang Chi Chueng, Vincent Y. F. Tan

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments demonstrate the superiority of the proposed TS algorithms compared to existing UCB-based ones.
Researcher Affiliation Academia Zixin Zhong EMAIL Department of Mathematics National University of Singapore 119076, Singapore Wang Chi Chueng EMAIL Department of Industrial Systems and Management National University of Singapore 117576, Singapore Vincent Y. F. Tan EMAIL Department of Electrical and Computer Engineering and Department of Mathematics National University of Singapore 117583, Singapore
Pseudocode Yes Algorithm 1 CTS, Thompson Sampling for Cascading Bandits with Beta Update Algorithm 2 TS-Cascade, Thompson Sampling for Cascading Bandits with Gaussian Update Algorithm 3 Lin TS-Cascade(λ) Algorithm 4 Generate feature matrix with historical data (Zong et al., 2016)
Open Source Code Yes The codes to reproduce all the experiments can be found at https://github.com/zixinzh/2021-JMLR.git.
Open Datasets No To begin with, we generate independent Bernoulli random variables with click probabilities w1, w2 or w3 for each item to produce the matrix Atrain {0, 1}m L in which m = 200.
Dataset Splits No The paper describes generating synthetic data and varying parameters (L, K, w1, w2, w3) for simulations, but does not specify standard training/test/validation splits for a fixed dataset. The experimental setup implies an online learning scenario rather than a fixed dataset split for evaluation.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as CPU or GPU models, or memory specifications. It only mentions 'average running times'.
Software Dependencies No The paper does not specify any particular software or library versions used for implementing the algorithms or conducting the experiments.
Experiment Setup Yes For the L items, we set the click probabilities of K of them to be w1 = 0.2, the probabilities of another K of them to be w2 = 0.1, and the probabilities of the remaining L 2K items as w3 = 0.05. We vary L {16, 32, 64, 256, 512, 1024, 2048} and K {2, 4}. We conduct 20 independent simulations with each algorithm under each setting of L, K. ... We vary the tuning parameters for each algorithm under consideration on a logarithmic scale. We present the best performance of each algorithm for cascading bandits with linear generalization with tuning parameters in the finite set {0.01, 0.1, 1, 10, 100}.