Budgeted-Bandits with Controlled Restarts with Applications in Learning and Computing
Authors: Semih Cayci, Yilin Zheng, Atilla Eryilmaz
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Furthermore, through numerical studies, we verified the applicability of our algorithm in the diverse contexts of: (i) algorithm portfolios for SAT solvers; (ii) task scheduling in wireless networks; and (iii) hyperparameter tuning in neural network training. |
| Researcher Affiliation | Collaboration | Semih Cayci EMAIL Department of Mathematics RWTH Aachen University Yilin Zheng EMAIL Google Atilla Eryilmaz EMAIL Department of Electrical and Computer Engineering The Ohio State University |
| Pseudocode | Yes | Algorithm 1: Asymptotically-Optimal Offline Policy πoff Algorithm 2: Online Learning Algorithms for Finite Set of Restart Times UCB-RM (πM) and UCB-RB (πB) Algorithm 3: Online Learning Algorithms for Continuous Set of Restart Times UCB-RC (πC) |
| Open Source Code | No | The paper does not explicitly state that source code for the methodology is provided or include any links to code repositories. |
| Open Datasets | Yes | We evaluated the performance of the meta-algorithms over the widely used Uniform Random-3-SAT benchmark set of satisfiable problem instances in the SATLIB library (Hoos & Stützle, 2000). Specifically, in our setup, we trained a RESNET-16 over the CIFAR-10 dataset |
| Dataset Splits | Yes | Specifically, in our setup, we trained a RESNET-16 over the CIFAR-10 dataset where we used 80 : 20 split for training set and testing set. |
| Hardware Specification | No | The paper mentions "GPU time" as a general resource and "personal computer" for average running time, but does not provide specific hardware details like GPU/CPU models or memory amounts used for experiments. |
| Software Dependencies | No | The paper mentions "SGD optimizer" but does not specify any software libraries or frameworks with version numbers that were used for the implementation. |
| Experiment Setup | Yes | For these experiments, the restart times are finite. Therefore, we used the UCB-RB Algorithm with α = 2.01, (1 + β)2/(1 β) = 1.01. For initialization, the controller performed 40 trials for each (k, tl) decision. The learning rate is set to 0.001 using an SGD optimizer with a batch size of 64. We choose 0.9 as the reward threshold. |