ADMMBO: Bayesian Optimization with Unknown Constraints using ADMM
Authors: Setareh Ariafar, Jaume Coll-Font, Dana Brooks, Jennifer Dy
JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on a number of challenging BO benchmark problems show that our proposed approach outperforms the state-of-the-art methods in terms of the speed of obtaining a feasible solution and convergence to the global optimum as well as minimizing the number of total evaluations of unknown objective and constraints functions. |
| Researcher Affiliation | Academia | Setareh Ariafar EMAIL Electrical and Computer Engineering Department Northeastern University Boston, MA 02115, USA Jaume Coll-Font EMAIL Computational radiology Laboratory Boston Children s Hospital Boston, MA 02115, USA Dana Brooks EMAIL Electrical and Computer Engineering Department Northeastern University Boston, MA 02115, USA Jennifer Dy EMAIL Electrical and Computer Engineering Department Northeastern University Boston, MA 02115, USA |
| Pseudocode | Yes | Algorithm 3.1 ADMMBO Algorithm 3.2 OPT Algorithm 3.3 FEAS |
| Open Source Code | Yes | Please see our opensource code available at https://github.com/Setareh Ar/ADMMBO for more details on each experiment. |
| Open Datasets | Yes | We compare ADMMBO with four state-of-the-art constrained Bayesian optimization methods1: EIC (Gelbart et al., 2014; Gardner et al., 2014), ALBO (Gramacy et al., 2016), Slack-AL (Picheny et al., 2016) and PESC (Hern andez-Lobato et al., 2015). In our last experiment, we tune the hyperparameters of a three-hidden-layers fully connected neural network for a multiclass classification task using MNIST dataset (Le Cun, 1998; Hern andez Lobato et al., 2015). |
| Dataset Splits | No | The paper mentions using the MNIST dataset and minimizing validation error, but it does not specify explicit training/test/validation split percentages or sample counts for the dataset. |
| Hardware Specification | Yes | We consider the optimization problem of finding a set of hyperparameters that minimize the validation error subject to the prediction time being smaller than or equal to 0.045 second on NVIDIA Tesla K80 GPU. |
| Software Dependencies | No | We build our network using Keras with Tensor Flow backends (Chollet et al., 2015; Abadi et al., 2016). While Keras and TensorFlow are mentioned as software used, no specific version numbers for these components are provided. |
| Experiment Setup | Yes | In all the synthetic problems, discussed below, similar to (Hern andez-Lobato et al., 2015; Picheny et al., 2016; Gramacy et al., 2016), we assume that f and ci follow independent GP priors with zero mean and squared exponential kernels. For the problem of hyperparameter tuning in Neural Networks on the MNIST dataset, we assume that f and ci , follow independent GP priors with zero mean and with Mat ern 5/2 kernels (Hern andez Lobato et al., 2015). For ADMMBO, in all the experiments we set M {20, 50}, ρ = 0.1, ϵ = 0.01, δ = 0.05 and initialize y1 i and z1 i with the bounds of B. Further, in all the experiments, we set the total BO iteration budget to 100(N + 1), where N is the number of constraints of the optimization. We empirically observed that ADMMBO performed best when we assign a higher BO budget for the first iteration of the algorithm. Thus, we set α1 = β1 i {10, 20, 50} for the first iteration and αk = βk i {2, 5} for the rest. Considering total BO budget and the budgets for the optimality and feasibility subproblems, we set K accordingly. We initialize datasets F and Ci with n = mi = 2 points. We set µ = 10 and τ incr = τ decr = 2 similar to (Boyd et al., 2011; Hong and Luo, 2017). We consider the optimization problem of finding a set of hyperparameters that minimize the validation error subject to the prediction time being smaller than or equal to 0.045 second on NVIDIA Tesla K80 GPU. Here, we focus on eleven hyperparameters: learning rate, decay rate, momentum parameter, two drop out probabilities for the input layer and the hidden layers as well as two regularization parameters for the weight decay, the weight maximum value, the number of hidden units in each of the 3 hidden layers, and the choice of activation function (RELU or sigmoid). |