reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Transfer Learning for Bayesian Optimization on Heterogeneous Search Spaces

Authors: Zhou Fan, Xinran Han, Zi Wang

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theoretical and empirical results demonstrate the validity of MPHD and its superior performance on challenging black-box function optimization tasks. ... To verify the usefulness of MPHD for BO ( 4), we conducted extensive experiments on real world BO transfer learning problems with heterogeneous search spaces. We tested benchmarks including HPO-B (Pineda-Arango et al., 2021) and PD1 (Wang et al., 2022), which involve 17 search spaces in total. Our results have shown significant improvement made by MPHD on sample efficiency for BO on functions with unseen search spaces.
Researcher Affiliation	Collaboration	Zhou Fan EMAIL Harvard University Xinran Han EMAIL Harvard University Zi Wang EMAIL Google Deep Mind
Pseudocode	Yes	Algorithm 1 MPHD pre-training and Bayesian optimization with acquisition function ac( ; θf).
Open Source Code	Yes	Our code for the experiments is built upon the codebase of Hyper BO (Wang et al., 2022) and is available at https://github.com/Evensgn/hyperbo-mphd.
Open Datasets	Yes	We tested benchmarks including HPO-B (Pineda-Arango et al., 2021) and PD1 (Wang et al., 2022), which involve 17 search spaces in total. ... For HPO-B Super-dataset and PD1 Dataset, we normalized the range of every domain dimension as well as function values to [0, 1].
Dataset Splits	Yes	Train/test splits: For any super-dataset D = {(Di, Si)}N i=1 of the two, we split every dataset Di in the super-dataset into a training dataset Dtrain i and a test dataset Dtest i , each containing a disjoint subset of sub-datasets in Di. As mentioned in 3.2.2, we used 80% sub-datasets within each dataset as the training sub-datasets and the remaining 20% as the test sub-datasets for Synthetic Super-dataset (L). HPO-B Super-dataset comes with a pre-specified per-dataset train/test split and we used the same setup. ... We randomly sampled 19 ( 80%) of the remaining 23 sub-datasets as training sub-datasets and used the remaining 4 ( 20%) sub-datasets as test sub-datasets.
Hardware Specification	No	The paper mentions using 'Microsoft Azure credits' and 'Google Cloud Platform Credit Awards' but does not specify the exact hardware configurations (e.g., GPU models, CPU types, memory) used within these cloud platforms for running the experiments.
Software Dependencies	No	The paper mentions using Adam optimizer and L-BFGS optimizer, and references a codebase, but does not provide specific version numbers for any software libraries or dependencies used in the implementation.
Experiment Setup	Yes	For all of the following experiments, the budget for BO is 100, and there are a set of 5 initial observations that are randomly sampled for each of the 5 random seeds. The acquisition function used for all GP-based methods is Probability of Improvement (PI) (Kushner, 1964) with target value max(yt) + 0.1. ... The number of iterations for the Adam optimizer is 20000, and each sub-dataset is randomly sub-sampled to 50 observations at each iteration. The learning rate of Adam optimizer is 0.001. ... The optimizer for this re-training is L-BFGS and the number of iterations is 100.