Optimum-statistical Collaboration Towards General and Efficient Black-box Optimization

Authors: Wenjie Li, Chi-Hua Wang, Guang Cheng, Qifan Song

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we empirically compare the proposed VHCT algorithm with the existing anytime blackbox optimization algorithms, including T-HOO (the truncated version of HOO), HCT, POO, and PCT (POO + HCT, (Shang et al., 2019)), and Bayesian Optimization algorithm BO (Frazier, 2018) to validate that the proposed variance-adaptive uncertainty quantifier can make the convergence of VHCT faster than non-adaptive algorithms. We run every algorithm for 20 independent trials in each experiment and plot the average cumulative regret with 1-standard deviation error bounds. The experimental details and additional numerical results on other objectives are provided in Appendix E.
Researcher Affiliation Academia Wenjie Li EMAIL Department of Statistics, Purdue University Chi-Hua Wang EMAIL Department of Statistics, University of California, Los Angles Guang Cheng EMAIL Department of Statistics, University of California, Los Angles Qifan Song EMAIL Department of Statistics, Purdue University
Pseudocode Yes Algorithm 1 Optimum-Statistical Collaboration (OSC) Algorithm 2 VHCT Algorithm (Short Version) Algorithm 3 VHCT Algorithm (Complete) Algorithm 4 Pull Update Algorithm 5 Update Backward
Open Source Code No For the implementation of all the algorithms, we utilize the publicly available code of POO and HOO at the link https://rdrr.io/cran/OOR/man/POO.html and the Py XAB library (Li et al., 2023). While the PyXAB library is co-authored by some of the authors of this paper, the paper does not explicitly state that the specific implementation of VHCT (the novel algorithm proposed in this paper) is available as part of this library or elsewhere.
Open Datasets Yes We tune the RBF kernel and the L2 regularization parameters when training Support Vector Machine (SVM) on the Landmine dataset (Liu et al., 2007), and the batch size, the learning rate, and the weight decay when training neural networks on the MNIST dataset (Deng, 2012). The dataset is available at http://www.ee.duke.edu/~lcarin/Landmine_Data.zip. The MNIST dataset can be downloaded from http://yann.lecun.com/exdb/mnist/
Dataset Splits No The paper mentions using a "training set" and "testing set" for both the Landmine and MNIST datasets. However, it does not provide specific details on how these splits were created, such as percentages, sample counts, or the methodology used for partitioning the data. For example, for the Landmine dataset: "The model is trained on the training set with the selected hyper-parameter and then evaluated on the testing set."
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU specifications, or memory amounts used for running the experiments. It only describes the experimental setup in terms of software and parameters.
Software Dependencies No The paper mentions using "the publicly available code of POO and HOO" and "the Py XAB library (Li et al., 2023)" but does not specify version numbers for these software components. For example: "For the implementation of all the algorithms, we utilize the publicly available code of POO and HOO at the link https://rdrr.io/cran/OOR/man/POO.html and the Py XAB library (Li et al., 2023)."
Experiment Setup Yes We run every algorithm for 20 independent trials in each experiment and plot the average cumulative regret with 1-standard deviation error bounds. For all the experiments in Section 5 and Appendix E.2, we have used a low-noise setting where ϵt Uniform( 0.05, 0.05) to verify the advantage of VHCT. In general, ρ = 0.75 or ρ = 0.5 are good choices for VHCT and HCT, and ρ = 0.25 is a good choice for T-HOO. Therefore, we use these parameter settings in the real-life experiments and the additional experiments in the next subsection. For POO and PCT, we follow Grill et al. (2015) and use ρmax = 0.9. The unknown bound b is set to be b = 1 for all the algorithms used in the experiments. We tune two hyper-parameters when training SVM, the RBF kernel parameter from [0.01, 10], and the L2 regularization from [1e-4, 10]... We tune three different hyper-parameters of SGD to find the best hyper-parameter, specifically, the mini batch-size from [1, 100], the learning rate from [1e-6, 1], and the weight decay from [1e-6, 5e-1].