Budgeted-Bandits with Controlled Restarts with Applications in Learning and Computing

Authors: Semih Cayci, Yilin Zheng, Atilla Eryilmaz

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Furthermore, through numerical studies, we verified the applicability of our algorithm in the diverse contexts of: (i) algorithm portfolios for SAT solvers; (ii) task scheduling in wireless networks; and (iii) hyperparameter tuning in neural network training.
Researcher Affiliation Collaboration Semih Cayci EMAIL Department of Mathematics RWTH Aachen University Yilin Zheng EMAIL Google Atilla Eryilmaz EMAIL Department of Electrical and Computer Engineering The Ohio State University
Pseudocode Yes Algorithm 1: Asymptotically-Optimal Offline Policy πoff Algorithm 2: Online Learning Algorithms for Finite Set of Restart Times UCB-RM (πM) and UCB-RB (πB) Algorithm 3: Online Learning Algorithms for Continuous Set of Restart Times UCB-RC (πC)
Open Source Code No The paper does not explicitly state that source code for the methodology is provided or include any links to code repositories.
Open Datasets Yes We evaluated the performance of the meta-algorithms over the widely used Uniform Random-3-SAT benchmark set of satisfiable problem instances in the SATLIB library (Hoos & Stützle, 2000). Specifically, in our setup, we trained a RESNET-16 over the CIFAR-10 dataset
Dataset Splits Yes Specifically, in our setup, we trained a RESNET-16 over the CIFAR-10 dataset where we used 80 : 20 split for training set and testing set.
Hardware Specification No The paper mentions "GPU time" as a general resource and "personal computer" for average running time, but does not provide specific hardware details like GPU/CPU models or memory amounts used for experiments.
Software Dependencies No The paper mentions "SGD optimizer" but does not specify any software libraries or frameworks with version numbers that were used for the implementation.
Experiment Setup Yes For these experiments, the restart times are finite. Therefore, we used the UCB-RB Algorithm with α = 2.01, (1 + β)2/(1 β) = 1.01. For initialization, the controller performed 40 trials for each (k, tl) decision. The learning rate is set to 0.001 using an SGD optimizer with a batch size of 64. We choose 0.9 as the reward threshold.