ADMMBO: Bayesian Optimization with Unknown Constraints using ADMM

Authors: Setareh Ariafar, Jaume Coll-Font, Dana Brooks, Jennifer Dy

JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on a number of challenging BO benchmark problems show that our proposed approach outperforms the state-of-the-art methods in terms of the speed of obtaining a feasible solution and convergence to the global optimum as well as minimizing the number of total evaluations of unknown objective and constraints functions.
Researcher Affiliation Academia Setareh Ariafar EMAIL Electrical and Computer Engineering Department Northeastern University Boston, MA 02115, USA Jaume Coll-Font EMAIL Computational radiology Laboratory Boston Children s Hospital Boston, MA 02115, USA Dana Brooks EMAIL Electrical and Computer Engineering Department Northeastern University Boston, MA 02115, USA Jennifer Dy EMAIL Electrical and Computer Engineering Department Northeastern University Boston, MA 02115, USA
Pseudocode Yes Algorithm 3.1 ADMMBO Algorithm 3.2 OPT Algorithm 3.3 FEAS
Open Source Code Yes Please see our opensource code available at https://github.com/Setareh Ar/ADMMBO for more details on each experiment.
Open Datasets Yes We compare ADMMBO with four state-of-the-art constrained Bayesian optimization methods1: EIC (Gelbart et al., 2014; Gardner et al., 2014), ALBO (Gramacy et al., 2016), Slack-AL (Picheny et al., 2016) and PESC (Hern andez-Lobato et al., 2015). In our last experiment, we tune the hyperparameters of a three-hidden-layers fully connected neural network for a multiclass classification task using MNIST dataset (Le Cun, 1998; Hern andez Lobato et al., 2015).
Dataset Splits No The paper mentions using the MNIST dataset and minimizing validation error, but it does not specify explicit training/test/validation split percentages or sample counts for the dataset.
Hardware Specification Yes We consider the optimization problem of finding a set of hyperparameters that minimize the validation error subject to the prediction time being smaller than or equal to 0.045 second on NVIDIA Tesla K80 GPU.
Software Dependencies No We build our network using Keras with Tensor Flow backends (Chollet et al., 2015; Abadi et al., 2016). While Keras and TensorFlow are mentioned as software used, no specific version numbers for these components are provided.
Experiment Setup Yes In all the synthetic problems, discussed below, similar to (Hern andez-Lobato et al., 2015; Picheny et al., 2016; Gramacy et al., 2016), we assume that f and ci follow independent GP priors with zero mean and squared exponential kernels. For the problem of hyperparameter tuning in Neural Networks on the MNIST dataset, we assume that f and ci , follow independent GP priors with zero mean and with Mat ern 5/2 kernels (Hern andez Lobato et al., 2015). For ADMMBO, in all the experiments we set M {20, 50}, ρ = 0.1, ϵ = 0.01, δ = 0.05 and initialize y1 i and z1 i with the bounds of B. Further, in all the experiments, we set the total BO iteration budget to 100(N + 1), where N is the number of constraints of the optimization. We empirically observed that ADMMBO performed best when we assign a higher BO budget for the first iteration of the algorithm. Thus, we set α1 = β1 i {10, 20, 50} for the first iteration and αk = βk i {2, 5} for the rest. Considering total BO budget and the budgets for the optimality and feasibility subproblems, we set K accordingly. We initialize datasets F and Ci with n = mi = 2 points. We set µ = 10 and τ incr = τ decr = 2 similar to (Boyd et al., 2011; Hong and Luo, 2017). We consider the optimization problem of finding a set of hyperparameters that minimize the validation error subject to the prediction time being smaller than or equal to 0.045 second on NVIDIA Tesla K80 GPU. Here, we focus on eleven hyperparameters: learning rate, decay rate, momentum parameter, two drop out probabilities for the input layer and the hidden layers as well as two regularization parameters for the weight decay, the weight maximum value, the number of hidden units in each of the 3 hidden layers, and the choice of activation function (RELU or sigmoid).