reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

HEBO: Pushing The Limits of Sample-Efficient Hyper-parameter Optimisation

Authors: Alexander I. Cowen-Rivers, Wenlong Lyu , Rasul Tutunov , Zhi Wang, Antoine Grosnit, Ryan Rhys Griffiths , Alexandre Max Maraval, Hao Jianye, Jun Wang, Jan Peters, Haitham Bou-Ammar

JAIR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results on the Bayesmark benchmark indicate that heteroscedasticity and non-stationarity pose signiﬁcant challenges for black-box optimisers. ... We demonstrate HEBO s empirical eﬃcacy on the Neur IPS 2020 Black-Box Optimisation challenge, where HEBO placed ﬁrst. Upon further analysis, we observe that HEBO signiﬁcantly outperforms existing black-box optimisers on 108 machine learning hyperparameter tuning tasks comprising the Bayesmark benchmark.
Researcher Affiliation	Collaboration	Alexander I. Cowen-Rivers EMAIL (Corresponding author) Huawei R&D Technische Universität Darmstadt Wenlong Lyu EMAIL Rasul Tutunov EMAIL Zhi Wang EMAIL Antoine Grosnit EMAIL Huawei R&D Ryan Rhys Griﬃths EMAIL University of Cambridge Alexandre Max Maravel EMAIL Hao Jianye EMAIL Huawei R&D Jun Wang EMAIL Huawei R&D University College London Jan Peters EMAIL Technische Universität Darmstadt Haitham Bou-Ammar EMAIL Huawei R&D University College London
Pseudocode	No	The paper describes methods and provides mathematical formulations and proofs (e.g., in Appendix B), but it does not include any clearly labeled pseudocode or algorithm blocks for the HEBO system or its components in a structured, step-by-step format.
Open Source Code	Yes	All code is made available at https://github.com/huawei-noah/HEBO.
Open Datasets	Yes	To that end, we undertake our evaluation in 2140 experiments from 108 real-world problems from the UCI repository (Dua & Graﬀ, 2017), which was also the testbed of choice for the Neur IPS 2020 Black-Box Optimisation challenge (Turner et al., 2021).
Dataset Splits	Yes	Values of the black-box objective are stochastic with noise contributions originating from the train-test splits used to compute the losses. ... Experimentation was facilitated by the Bayesmark2 package. ... Each experiment is repeated for 20 random seeds.
Hardware Specification	No	The paper mentions "exploiting graphical processing units (Knudde et al., 2017; Balandat et al., 2020)" in the context of general GP surrogates but does not provide specific hardware details (e.g., GPU models, CPU models, or memory specifications) used for their own experiments.
Software Dependencies	No	The paper mentions several software packages and libraries like Pymoo (Blank & Deb, 2020), Sk Opt (Pedregosa et al., 2011), and Hyper Opt (Bergstra et al., 2015) that were used or for comparison. However, it does not provide specific version numbers for these or other critical software components necessary for replication.
Experiment Setup	Yes	We run experiments on either 16 iterations with a batch of 8 query points per iteration or 100 iterations with 1 query point. Each experiment is repeated for 20 random seeds. ... Full hyper-parameter search spaces are deﬁned in Table 2 and Table 3.