reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch

Authors: Virginia Aglietti, Ira Ktena, Jessica Schrouff, Eleni Sgouritsa, Francisco Ruiz, Alan Malek, Alexis Bellot, Silvia Chiappa

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments explore Fun BO s ability to generate novel and efficient AFs across a wide variety of settings. In particular, we demonstrate its potential to generate AFs that generalize well to the optimization of functions both in distribution (ID, i.e. within function classes) and out of distribution (OOD, i.e. across function classes) by running three different types of experiments: 1. OOD-Bench tests generalization across function classes... 2. ID-Bench, HPO-ID and GPs-ID test Fun BO-generated AFs within function classes... 3. FEW-SHOT demonstrates how Fun BO can be used in the context of few-shot fast adaptation of an AF.
Researcher Affiliation	Industry	1Google Deep Mind, London, UK. Correspondence to: Virginia Aglietti <EMAIL>. *Now at Glaxo Smith Kline.
Pseudocode	Yes	Figure 1. Left: The Fun BO algorithm. Right: Graphical representation of Fun BO. The different Fun BO component w.r.t. Fun Search (Romera-Paredes et al., 2023, Fig. 1) are highlighted in color. ... Fig. 6 gives the Python code for the initial acquisition function used by Fun BO, including the full docstring. ... Fig. 7 for Python code) ... Fig. 8. Python code for the second part of e used in Fun BO.
Open Source Code	No	11See code at github.com/google-deepmind/funsearch. (This refers to the Fun Search framework, not explicitly the Fun BO implementation described in this paper.)
Open Datasets	Yes	Our experiments explore Fun BO s ability to generate novel and efficient AFs across a wide variety of settings. In particular, we demonstrate its potential to generate AFs that generalize well to the optimization of functions both in distribution (ID, i.e. within function classes) and out of distribution (OOD, i.e. across function classes) by running three different types of experiments: 1. OOD-Bench tests generalization across function classes by running Fun BO with G containing different standard global optimization benchmarks and testing on a set F ... HPO-ID. We test Fun BO on two HPO tasks where the goal is to minimize the loss (d = 2) of an RBF-based SVM and an Ada Boost algorithm. ... We use precomputed loss values across 50 datasets given as part of the Hy LAP project.
Dataset Splits	Yes	For each of these three functions, we train both Fun BO and Meta BO with \|G\| = 25 instances of the original function obtained by scaling and translating it with values in [0.9, 1.1] and [ 0.1, 0.1]d respectively. For Fun BO we randomly assign 5 functions in G to GV and keep the rest in GTr. ... we train Fun BO and Meta BO with losses computed on a random selection of 35 of the 50 available datasets and test on losses computed on the remaining 15 datasets. For Fun BO we randomly assign 5 datasets to GV and keep the rest in GTr. ... FSAF trains the initial AF with a set of GPs, adapts it using 5 instances of scaled and translated Ackley functions, then tests the adapted AF on 100 additional Ackley instances, generated in the same manner.
Hardware Specification	No	For AF sampling, we used 5 Codey instances running on tensor processing units on a computing cluster. For scoring, we used 100 CPU evaluators per LLM instance.
Software Dependencies	No	We employ Codey, an LLM fine-tuned on a large code corpus and based on the Pa LM model family (Google-Pa LM-2-Team, 2023), to generate AFs. ... Figures 6, 7, and 8 show Python code that utilizes libraries like numpy, scipy.stats.norm, and GPy without specifying version numbers. All experiments are conducted using Fun Search with default hyperparameters, but no software version is specified for Fun Search.
Experiment Setup	Yes	To isolate the effects of different AFs, we employ the same experimental setting across all methods in terms of: (i) the number of trials T; (ii) the hyperparameters of the GP surrogate models (tuned offline); (iii) the evaluation grid for the AF, which is set to be a Sobol grid (Sobol , 1967) on the input space; and (iv) the initial design, which includes the input point yielding the maximum function value on the grid. ... The Gaussian likelihood noise σ2 is set to 1e 5 unless otherwise stated. We set T = 30 for all experiments apart from HPO-ID and GPs-ID for which we use T = 20 to ensure faster evaluations of generated AFs. We run Fun BO with T = 48hrs, B = 12 and NDB = 10. We ran UCB with β = 1. Table 1 and Table 2 provide specific parameters (d, NSG, ℓ, σ2f, σ2) for objective functions.