Multiobjective Tree-Structured Parzen Estimator

Authors: Yoshihiko Ozaki, Yuki Tanigaki, Shuhei Watanabe, Masahiro Nomura, Masaki Onishi

JAIR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that MOTPE approximates the Pareto fronts of a variety of benchmark problems and a convolutional neural network design problem better than existing methods through the numerical results. We also investigate how the configuration of MOTPE affects the behavior and the performance of the method and the effectiveness of asynchronous parallelization of the method based on the empirical results.
Researcher Affiliation Collaboration Yoshihiko Ozaki EMAIL Artificial Intelligence Research Center, AIST, Tokyo, Japan GREE, Inc., Tokyo, Japan; Yuki Tanigaki EMAIL Artificial Intelligence Research Center, AIST, Tokyo, Japan; Shuhei Watanabe EMAIL University of Freiburg, Freiburg, Germany; Masahiro Nomura nomura EMAIL Cyber Agent, Inc., Tokyo, Japan; Masaki Onishi EMAIL Artificial Intelligence Research Center, AIST, Tokyo, Japan
Pseudocode Yes Algorithm 1 Tree-structured Parzen Estimator; Algorithm 2 Split Observations; Algorithm 3 Greedy Hypervolume Subset Selection; Algorithm 4 Multiobjective Tree-structured Parzen Estimator; Algorithm 5 Asynchronous Parallel MOTPE; Algorithm 6 Worker for Asynchronous Parallel MOTPE
Open Source Code Yes The code is available at https://doi.org/10.5281/zenodo.6258358. The singularity container image which we used to run our code and the experimental data of Section 5 are available upon request.
Open Datasets Yes The WFG benchmark suite (Huband, Barone, While, & Hingston, 2005; Huband, Hingston, Barone, & While, 2006) that consists of nine problems was used to analyze the fundamental performance of MOTPE. ... the classification error rate for the CIFAR-10 dataset (Krizhevsky, 2009)
Dataset Splits Yes The number of initial observations ni was set to 11n - 1 (i.e., 32 for n = 3 and 98 for n = 9), and the initial solutions were sampled using the Latin hypercube sampling (Mc Kay et al., 1979). ... The error rate was measured on a set of 10,000 images extracted from the training set. The rest of the training data, 40,000 images, were used for training.
Hardware Specification No Computational resource of AI Bridging Cloud Infrastructure (ABCI) provided by National Institute of Advanced Industrial Science and Technology (AIST) was used. ... although np = 4 is relatively small, this is a realistic setting as it is not easy to prepare dozens of graphics processing units.
Software Dependencies Yes We implemented MOTPE3 by modifying the TPE implementation of Optuna (version 2.0.0) (Akiba et al., 2019) whereas we used Spearmint4 (Snoek et al., 2012) for Par EGO, SMS-EGO and PESMO, and Hyper Mapper 2.0 (version 2.2.3) for Bayesian optimization with a random forests-based surrogate because Optuna does not provide these algorithms. ... The CNNs were implemented in the tf.keras (Tensorflow version 2.2.0) library and trained using the SGD optimizer with a batch size of 32 during 50 epochs.
Experiment Setup Yes We set γ = 0.10, nc = 24, and the scales for all parameters to uniform for MOTPE. ... The settings for Spearmint are shown in Table 1. ... The evaluation budget (including initial evaluations) was set to 1,000, the number of initial observations was set to 100, γ was set to 0.10, and nc was set to 24 for MOTPE. On the other hand, the evaluation budget for NSGA-II was set to 10,000. The remaining settings for NSGA-II were set to population size = 100, mutation prob = 1/n, crossover prob = 0.9, and swapping prob = 0.5. ... For MOTPE, we set γ = 0.10, nc = 24, the parameter scales for Number of units and SGD learning rate to log-uniform, and those for the rest of the numerical parameters to uniform. For Spearmint, we set likelihood = GAUSSIAN because the problem is noisy. Additionally, we marked Dropout rate, SGD learning rate, and SGD momentum as to ignore for the second objective because our second objective does not depend on these parameters... The CNNs were implemented in the tf.keras (Tensorflow version 2.2.0) library and trained using the SGD optimizer with a batch size of 32 during 50 epochs.