Language-Based Bayesian Optimization Research Assistant (BORA)

Authors: Abdoulatif Cissé, Xenophon Evangelopoulos, Vladimir V. Gusev, Andrew I. Cooper

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate the effectiveness of our approach on synthetic benchmarks with up to 15 variables and demonstrate the ability of LLMs to reason in four real-world experimental tasks where context-aware suggestions boost optimization performance substantially.
Researcher Affiliation Academia Department of Chemistry, University of Liverpool, England, UK 2Leverhulme Research Centre for Functional Materials Design, University of Liverpool, England, UK 3Department of Computer Science, University of Liverpool, England, UK EMAIL
Pseudocode Yes Algorithm 1 BORA Input: Experiment card, Number of initial samples ninit, Maximum number of samples imax Output: ymax, LLM comments C and final report
Open Source Code Yes The source code is available at https://github.com/Ablatif6c/bora-the-explorer.
Open Datasets Yes Synthetic Function Benchmarks Branin (2D): A function with a global maximum occurring in three distinct locations as shown in Figure 3. ... Levy (10D): ... Ackley (15D): ... Solar Energy Production (4D): Maximizing the daily energy output of a solar panel by optimizing panel tilt, azimuth, and system parameters [Anderson et al., 2023]. ... Sugar Beet Production (8D): Maximizing the monthly sugar beet Total Above Ground Production (TAGP) in a greenhouse by tuning the irradiance, and other weather and soil conditions [de Wit and contributors, 2024]. ... Hydrogen Production (10D) Maximizing the hydrogen evolution rate (HER) for a multi-component catalyst mixture by tuning discrete chemical inputs under the constraint that the total volume of the chemicals must not exceed 5 m L. Note that due to the discrete and constrained nature of the problem, we adapted all compared methods accordingly to account for this, by employing the bespoke implementation of [Burger et al., 2020]. Dataset acquired from [Ciss e et al., 2024].
Dataset Splits No All methods were initialized with ninit = 5 initial samples apart from LAEA, for which we used 15 initial samples to keep the same number of evaluations to population size ratio as in [Hao et al., 2024]. The maximum number of samples was set to 105 to account for realistic budgets with expensive functions. The paper describes initial samples and maximum sample budget for an optimization process, not traditional dataset splits for training, validation, and testing.
Hardware Specification No No specific hardware details (such as GPU/CPU models, processor types, or memory) are provided in the paper. The paper mentions using 'GPT-4o-mini' but this refers to the language model, not the experimental hardware.
Software Dependencies No BORA We implemented BORA using Open AI s most cost-effective model at the time, GPT-4o-mini [Open AI, 2025], which was not fine-tuned in our effort to make BORA more accessible to users with limited resources. For the BO action implementation, the GP uses a Mat ern kernel, and the acquisition function is EI. We set q = 5,000 for σGP t,mean. The paper mentions the specific LLM model used but does not provide specific version numbers for other ancillary software components, libraries, or programming languages used in the implementation.
Experiment Setup Yes We set q = 5,000 for σGP t,mean. For the BO action implementation, the GP uses a Mat ern kernel, and the acquisition function is EI. ... σt,upper = 0.5 σGP t,max and σt,lower = 0.3 σGP t,max, ... γ = 0.05, n BO = 5, n LBO = 2; ... The plateau duration m is initialized at minit = 2d , set to vary between mmin = 0 and mmax = 3minit, and is automatically adjusted at every LLM intervention step l ... max is the maximum allowed adjustment per step, here set to 15 ... ϵ = 10 6 is a small constant ... W = min(|H|, 3). ... All methods were initialized with ninit = 5 initial samples apart from LAEA, for which we used 15 initial samples ... The maximum number of samples was set to 105 to account for realistic budgets with expensive functions.