reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bayesian Optimization in a Billion Dimensions via Random Embeddings

Authors: Ziyu Wang, Frank Hutter, Masrour Zoghi, David Matheson, Nando de Feitas

JAIR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results conﬁrm that REMBO can eﬀectively solve problems with billions of dimensions, provided the intrinsic dimensionality is low. They also show that REMBO achieves state-of-the-art performance in optimizing the 47 discrete parameters of a popular mixed integer linear programming solver.
Researcher Affiliation	Academia	Ziyu Wang EMAIL Department of Computer Science, University of Oxford Frank Hutter EMAIL Department of Computer Science, University of Freiburg Masrour Zoghi EMAIL Department of Computer Science, University of Amsterdam David Matheson EMAIL Department of Computer Science, University of British Columbia Nando de Freitas EMAIL Department of Computer Science, University of Oxford Canadian Institute for Advanced Research
Pseudocode	Yes	Algorithm 1 Bayesian Optimization 1: Initialize D0 as . 2: for t = 1, 2, . . . do 3: Find xt+1 RD by optimizing the acquisition function u: xt+1 =arg maxx X u(x\|Dt). 4: Augment the data Dt+1 = Dt {(xt+1, f(xt+1))}. 5: Update the kernel hyper-parameters. Algorithm 2 REMBO: Bayesian Optimization with Random Embedding. Blue text denotes parts that are changed compared to standard Bayesian Optimization. 1: Generate a random matrix A RD d 2: Choose the bounded region set Y Rd 3: Initialize D0 as . 4: for t = 1, 2, . . . do 5: Find yt+1 Rd by optimizing the acquisition function u: yt+1 = arg maxy Y u(y\|Dt). 6: Augment the data Dt+1 = Dt {(yt+1, f(Ayt+1))}. 7: Update the kernel hyper-parameters.
Open Source Code	Yes	The code for REMBO, as well as all data used in our experiments is publicly available at https://github.com/ziyuw/rembo.
Open Datasets	No	The paper mentions using a "de = 2-dimensional benchmark function for Bayesian optimization, embedded in a D-dimensional space" (Branin function), "a conﬁguration problem obtained from Hutter et al. (2010), aiming to conﬁgure the 40 binary and 7 categorical parameters of lpsolve", and data for "random forest body part classiﬁer... from a random pose of the CMU mocap dataset". While these sources might be publicly known, the paper does not provide concrete access information (links, DOIs, specific citations for the datasets themselves) for these datasets as per the requirements. The lpsolve problem cites a paper, but that's for the problem, not a dataset download.
Dataset Splits	Yes	For the random forest body part classiﬁer, the paper states: "Both validation and test data contained all pixels in the 500 validation and test images, respectively." For the lpsolve configuration: "we only consider conﬁguration on a single problem instance."
Hardware Specification	No	The paper mentions "In total, we used over half a year of CPU time for the experiments in this paper." but does not specify any particular CPU models, GPU models, or other hardware specifications.
Software Dependencies	No	The paper mentions "lpsolve (Berkelaar, Eikland, & Notebaert, 2016)", "DIRECT (Jones et al., 1993)", and "CMA-ES (Hansen & Ostermeier, 2001)". While these are software components, no specific version numbers are provided for DIRECT or CMA-ES, and lpsolve refers to a publication year rather than a version number of the software used.
Experiment Setup	Yes	In our experiments, we set the initial constraint [L, U] to be [0.01, 50] and set tσ = 0.002. Table 2: Parameter ranges for random forest classiﬁer. For the purpose of optimization, the maximum tree depth and the number of potential oﬀsets were transformed to log space. Parameter Range Max. tree depth [1 60] Min. No. samples for non leaf nodes [1 100] No. potential oﬀsets to evaluate [1 5000] Bootstrap for per tree sampling [T F]