reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Modeling All Response Surfaces in One for Conditional Search Spaces

Authors: Jiaxing Li, Wei Liu, Chao Xue, Yibing Zhan, Xiaoxing Wang, Weifeng Liu, Dacheng Tao

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The empirical results on a simulation function, various real-world tasks, and HPO-B benchmark demonstrate that our proposed approach improves the efficacy and efficiency of BO within conditional search spaces. Concretely, we design a structureaware hyperparameter embedding to preserve the structural information. Then, we introduce an attention-based deep feature extractor, capable of projecting configurations with different structures from various subspaces into a unified feature space, where the response surfaces can be formulated using a single standard Gaussian Process. The empirical results on a simulation function, various real-world tasks, and HPO-B benchmark demonstrate that our proposed approach improves the efficacy and efficiency of BO within conditional search spaces.
Researcher Affiliation	Collaboration	Jiaxing Li1,2*, Wei Liu2, Chao Xue2, Yibing Zhan2 , Xiaoxing Wang3, Weifeng Liu1, Dacheng Tao4 1China University of Petroleum (East China) 2JD Explore Academy 3Shanghai Jiao Tong University 4Nanyang Technological University
Pseudocode	Yes	Algorithm 1: Attn BO: An Attention-based Bayesian Optimization Method for Conditional Search Spaces.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code, nor does it include a link to a code repository.
Open Datasets	Yes	For the NAS tasks, we train each candidate on CIFAR-10 training set for 100 epochs and evaluate on the testing set. For the Open ML tasks, we designed two conditional search spaces, one for SVM and one for XGBoost, both of which are widely used machine learning models for tabular data. Supported by Open ML (Vanschoren et al. 2013), we consider 6 most evaluated datasets: [10101, 37, 9967, 9946, 10093, 3494]. We verify this feature on HPO-B-v3 (Pineda Arango et al. 2021), a large-scale hyperparameter optimization benchmark that contains a collection of 935 black-box tasks for 16 hyperparameter search spaces evaluated on 101 datasets.
Dataset Splits	Yes	For the NAS tasks, we train each candidate on CIFAR-10 training set for 100 epochs and evaluate on the testing set. We meta-train our model on all training data points of the 16 search spaces and fine-tune it on the test tasks to get the final performance.
Hardware Specification	No	The paper discusses computational cost and efficiency but does not specify the exact hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using Adam optimizer and a Transformer encoder, but does not provide specific version numbers for any software libraries, frameworks, or programming languages.
Experiment Setup	Yes	Specifically, we employ 6 attention blocks with 2 parallel attention heads. The dimensionality of input and output is dmodel = 256 (4 × 64), and the inner layer also has a dimensionality of 512. We adopt average pooling to integrate the output of the transformer encoder and utilize a multi-layer perceptron (MLP) with 4 hidden layers, which has [128, 128, 128, 32] units of each hidden layer, to project the features of the configurations into 32-dim vectors. We train the embedding layer and attention-based encoder jointly for 100 epochs using Adam (Kingma and Ba 2015). We set the initial learning rate to 0.001 and reduce it by half every 30 epochs. Following the settings of Bandits-BO, we give 2n random points to initialize BO methods. For the NAS tasks, we train each candidate on CIFAR-10 training set for 100 epochs and evaluate on the testing set.