Offline-to-Online Hyperparameter Transfer for Stochastic Bandits

Authors: Dravyansh Sharma, Arun Suggala

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments indicate the significance and effectiveness of the transfer of hyperparameters from offline problems in online learning with stochastic bandit feedback. In this section, we provide empirical evidence for the significance of our hyperparameter transfer framework on real and synthetic data.
Researcher Affiliation Collaboration Dravyansh Sharma1, Arun Suggala2 1TTIC 2Google Deep Mind EMAIL, EMAIL
Pseudocode Yes Algorithm 1: UCB(α) Input: Arms {1, . . . , n}, max steps T Output: Arm pulls {At [n]}t [T ] Algorithm 2: TUNEDUCB(αmin, αmax) Input: Parameter interval [αmin, αmax], Arm rewards rijk, i [n], j [T], k [N] from offline data Output: Learned parameter ˆα Algorithm 3: α-CRITICALPOINTS(αl, αh, t[n], µ[n], R[n]) Input: Parameter interval [αmin, αmax], Arm pulls so far ti, Mean rewards so far µi, Future arm rewards Ri, i [n]. Output: Learned parameter ˆα. Algorithm 4: LINUCB(α) Input: Arms {1, . . . , n}, max steps T, feature dimension d Output: Arm pulls {At [n]}t [T ] Algorithm 5: GP-UCB(σ2) (Srinivas et al. 2010) Input: Input space C, GP prior µ0 = 0, σ0, kernel k( , ) such that k(x, x ) 1 for any x, x C, {βt}t [T ]. Output: Point {xt C}t [T ]
Open Source Code No The paper discusses various algorithms and their performance but does not provide an explicit statement about releasing its own source code or a link to a repository.
Open Datasets Yes We present our results for CIFAR-10 and CIFAR-100 (Krizhevsky 2009) benchmark image classification datasets.
Dataset Splits Yes For each dataset we run Algorithm 2 over N = 200 training/offline tasks with time horizon To = 20
Hardware Specification Yes All our experiments on CIFAR are run on 1 Nvidia A100 GPU.
Software Dependencies No The paper mentions training neural networks via SGD (stochastic gradient descent) but does not provide specific software names with version numbers (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes The arms consist of 11 different learning rates (0.001,0.002,0.004,0.006,0.008,0.01,0.05,0.1, 0.2, 0.4, 0.8) and the arm reward is given by the classification accuracy of feedforward neural networks trained via SGD (stochastic gradient descent) with that learning rate and a batch size of 64 for 20 epochs. For each dataset we run Algorithm 2 over N = 200 training/offline tasks with time horizon To = 20, and run corralling for a grid of ten hyperparameter values α = {0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 100}.