Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits
Authors: Yue Kang, Cho-Jui Hsieh, Thomas Lee
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we show by experiments that our hyperparameter tuning framework outperforms the theoretical hyperparameter setting and other tuning methods with various (generalized) linear bandit algorithms. We run comprehensive experiments on both simulations and real-world datasets. Specifically, for the real data, we use the benchmark Movielens 100K dataset along with the Yahoo News dataset: |
| Researcher Affiliation | Collaboration | Yue Kang EMAIL University of California, Davis Cho-Jui Hsieh EMAIL Google and University of California, Los Angeles Thomas C. M. Lee EMAIL University of California, Davis |
| Pseudocode | Yes | Algorithm 1 Zooming TS algorithm with Restarts |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | We run comprehensive experiments on both simulations and real-world datasets. Specifically, for the real data, we use the benchmark Movielens 100K dataset along with the Yahoo News dataset... Movielens 100K dataset: This dataset contains 100K ratings from 943 users on 1,682 movies. For data pre-processing, we utilize LIBPMF (Yu et al., 2014)... Yahoo News dataset: We downloaded the Yahoo Recommendation dataset R6A, which contains Yahoo data from May 1 to May 10, 2009... We transform the contextual information into a 6-dimensional vector based on the processing in (Chu et al., 2009). |
| Dataset Splits | Yes | We also include a warming-up period of length T1 in the beginning to guarantee sufficient exploration... Specifically, for Lin UCB, Lin TS, UCB-GLM, GLM-TSL and Laplace-TS, we choose it to be 118. For GLOC and SGD-TS, we set it as 45. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only mentions running times in seconds without specifying the computational environment. |
| Software Dependencies | No | The paper mentions using LIBPMF (Yu et al., 2014) for data preprocessing but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | We run comprehensive experiments on both simulations and real-world datasets... We believe a large value of warm-up period T1 may abandon some useful information in practice, and hence we use T1 = T 2/(p+3) according to Theorem 4.2 in experiments. And we would restart our hyperparameter tuning layer after every T2 = 3T (p+2)/(p+3) rounds... The time horizon T is set to 14,000. For linear models, the expected reward of arm a is formulated as x t,aθ and random noise is sampled from N(0, 0.5)... Each experiment is repeated for 20 times. |