reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Offline-to-Online Hyperparameter Transfer for Stochastic Bandits

Authors: Dravyansh Sharma, Arun Suggala

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments indicate the significance and effectiveness of the transfer of hyperparameters from offline problems in online learning with stochastic bandit feedback. In this section, we provide empirical evidence for the significance of our hyperparameter transfer framework on real and synthetic data.
Researcher Affiliation	Collaboration	Dravyansh Sharma1, Arun Suggala2 1TTIC 2Google Deep Mind EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: UCB(α) Input: Arms {1, . . . , n}, max steps T Output: Arm pulls {At [n]}t [T ] Algorithm 2: TUNEDUCB(αmin, αmax) Input: Parameter interval [αmin, αmax], Arm rewards rijk, i [n], j [T], k [N] from offline data Output: Learned parameter ˆα Algorithm 3: α-CRITICALPOINTS(αl, αh, t[n], µ[n], R[n]) Input: Parameter interval [αmin, αmax], Arm pulls so far ti, Mean rewards so far µi, Future arm rewards Ri, i [n]. Output: Learned parameter ˆα. Algorithm 4: LINUCB(α) Input: Arms {1, . . . , n}, max steps T, feature dimension d Output: Arm pulls {At [n]}t [T ] Algorithm 5: GP-UCB(σ2) (Srinivas et al. 2010) Input: Input space C, GP prior µ0 = 0, σ0, kernel k( , ) such that k(x, x ) 1 for any x, x C, {βt}t [T ]. Output: Point {xt C}t [T ]
Open Source Code	No	The paper discusses various algorithms and their performance but does not provide an explicit statement about releasing its own source code or a link to a repository.
Open Datasets	Yes	We present our results for CIFAR-10 and CIFAR-100 (Krizhevsky 2009) benchmark image classification datasets.
Dataset Splits	Yes	For each dataset we run Algorithm 2 over N = 200 training/offline tasks with time horizon To = 20
Hardware Specification	Yes	All our experiments on CIFAR are run on 1 Nvidia A100 GPU.
Software Dependencies	No	The paper mentions training neural networks via SGD (stochastic gradient descent) but does not provide specific software names with version numbers (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	The arms consist of 11 different learning rates (0.001,0.002,0.004,0.006,0.008,0.01,0.05,0.1, 0.2, 0.4, 0.8) and the arm reward is given by the classification accuracy of feedforward neural networks trained via SGD (stochastic gradient descent) with that learning rate and a batch size of 64 for 20 epochs. For each dataset we run Algorithm 2 over N = 200 training/offline tasks with time horizon To = 20, and run corralling for a grid of ten hyperparameter values α = {0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 100}.