Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits

Authors: Yue Kang, Cho-Jui Hsieh, Thomas Lee

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we show by experiments that our hyperparameter tuning framework outperforms the theoretical hyperparameter setting and other tuning methods with various (generalized) linear bandit algorithms. We run comprehensive experiments on both simulations and real-world datasets. Specifically, for the real data, we use the benchmark Movielens 100K dataset along with the Yahoo News dataset:
Researcher Affiliation Collaboration Yue Kang EMAIL University of California, Davis Cho-Jui Hsieh EMAIL Google and University of California, Los Angeles Thomas C. M. Lee EMAIL University of California, Davis
Pseudocode Yes Algorithm 1 Zooming TS algorithm with Restarts
Open Source Code No The paper does not contain an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets Yes We run comprehensive experiments on both simulations and real-world datasets. Specifically, for the real data, we use the benchmark Movielens 100K dataset along with the Yahoo News dataset... Movielens 100K dataset: This dataset contains 100K ratings from 943 users on 1,682 movies. For data pre-processing, we utilize LIBPMF (Yu et al., 2014)... Yahoo News dataset: We downloaded the Yahoo Recommendation dataset R6A, which contains Yahoo data from May 1 to May 10, 2009... We transform the contextual information into a 6-dimensional vector based on the processing in (Chu et al., 2009).
Dataset Splits Yes We also include a warming-up period of length T1 in the beginning to guarantee sufficient exploration... Specifically, for Lin UCB, Lin TS, UCB-GLM, GLM-TSL and Laplace-TS, we choose it to be 118. For GLOC and SGD-TS, we set it as 45.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only mentions running times in seconds without specifying the computational environment.
Software Dependencies No The paper mentions using LIBPMF (Yu et al., 2014) for data preprocessing but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes We run comprehensive experiments on both simulations and real-world datasets... We believe a large value of warm-up period T1 may abandon some useful information in practice, and hence we use T1 = T 2/(p+3) according to Theorem 4.2 in experiments. And we would restart our hyperparameter tuning layer after every T2 = 3T (p+2)/(p+3) rounds... The time horizon T is set to 14,000. For linear models, the expected reward of arm a is formulated as x t,aθ and random noise is sampled from N(0, 0.5)... Each experiment is repeated for 20 times.