Offline-to-Online Hyperparameter Transfer for Stochastic Bandits
Authors: Dravyansh Sharma, Arun Suggala
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments indicate the significance and effectiveness of the transfer of hyperparameters from offline problems in online learning with stochastic bandit feedback. In this section, we provide empirical evidence for the significance of our hyperparameter transfer framework on real and synthetic data. |
| Researcher Affiliation | Collaboration | Dravyansh Sharma1, Arun Suggala2 1TTIC 2Google Deep Mind EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: UCB(α) Input: Arms {1, . . . , n}, max steps T Output: Arm pulls {At [n]}t [T ] Algorithm 2: TUNEDUCB(αmin, αmax) Input: Parameter interval [αmin, αmax], Arm rewards rijk, i [n], j [T], k [N] from offline data Output: Learned parameter ˆα Algorithm 3: α-CRITICALPOINTS(αl, αh, t[n], µ[n], R[n]) Input: Parameter interval [αmin, αmax], Arm pulls so far ti, Mean rewards so far µi, Future arm rewards Ri, i [n]. Output: Learned parameter ˆα. Algorithm 4: LINUCB(α) Input: Arms {1, . . . , n}, max steps T, feature dimension d Output: Arm pulls {At [n]}t [T ] Algorithm 5: GP-UCB(σ2) (Srinivas et al. 2010) Input: Input space C, GP prior µ0 = 0, σ0, kernel k( , ) such that k(x, x ) 1 for any x, x C, {βt}t [T ]. Output: Point {xt C}t [T ] |
| Open Source Code | No | The paper discusses various algorithms and their performance but does not provide an explicit statement about releasing its own source code or a link to a repository. |
| Open Datasets | Yes | We present our results for CIFAR-10 and CIFAR-100 (Krizhevsky 2009) benchmark image classification datasets. |
| Dataset Splits | Yes | For each dataset we run Algorithm 2 over N = 200 training/offline tasks with time horizon To = 20 |
| Hardware Specification | Yes | All our experiments on CIFAR are run on 1 Nvidia A100 GPU. |
| Software Dependencies | No | The paper mentions training neural networks via SGD (stochastic gradient descent) but does not provide specific software names with version numbers (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | The arms consist of 11 different learning rates (0.001,0.002,0.004,0.006,0.008,0.01,0.05,0.1, 0.2, 0.4, 0.8) and the arm reward is given by the classification accuracy of feedforward neural networks trained via SGD (stochastic gradient descent) with that learning rate and a batch size of 64 for 20 epochs. For each dataset we run Algorithm 2 over N = 200 training/offline tasks with time horizon To = 20, and run corralling for a grid of ten hyperparameter values α = {0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 100}. |