reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Language-Based Bayesian Optimization Research Assistant (BORA)

Authors: Abdoulatif Cissé, Xenophon Evangelopoulos, Vladimir V. Gusev, Andrew I. Cooper

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the effectiveness of our approach on synthetic benchmarks with up to 15 variables and demonstrate the ability of LLMs to reason in four real-world experimental tasks where context-aware suggestions boost optimization performance substantially.
Researcher Affiliation	Academia	Department of Chemistry, University of Liverpool, England, UK 2Leverhulme Research Centre for Functional Materials Design, University of Liverpool, England, UK 3Department of Computer Science, University of Liverpool, England, UK EMAIL
Pseudocode	Yes	Algorithm 1 BORA Input: Experiment card, Number of initial samples ninit, Maximum number of samples imax Output: ymax, LLM comments C and final report
Open Source Code	Yes	The source code is available at https://github.com/Ablatif6c/bora-the-explorer.
Open Datasets	Yes	Synthetic Function Benchmarks Branin (2D): A function with a global maximum occurring in three distinct locations as shown in Figure 3. ... Levy (10D): ... Ackley (15D): ... Solar Energy Production (4D): Maximizing the daily energy output of a solar panel by optimizing panel tilt, azimuth, and system parameters [Anderson et al., 2023]. ... Sugar Beet Production (8D): Maximizing the monthly sugar beet Total Above Ground Production (TAGP) in a greenhouse by tuning the irradiance, and other weather and soil conditions [de Wit and contributors, 2024]. ... Hydrogen Production (10D) Maximizing the hydrogen evolution rate (HER) for a multi-component catalyst mixture by tuning discrete chemical inputs under the constraint that the total volume of the chemicals must not exceed 5 m L. Note that due to the discrete and constrained nature of the problem, we adapted all compared methods accordingly to account for this, by employing the bespoke implementation of [Burger et al., 2020]. Dataset acquired from [Ciss e et al., 2024].
Dataset Splits	No	All methods were initialized with ninit = 5 initial samples apart from LAEA, for which we used 15 initial samples to keep the same number of evaluations to population size ratio as in [Hao et al., 2024]. The maximum number of samples was set to 105 to account for realistic budgets with expensive functions. The paper describes initial samples and maximum sample budget for an optimization process, not traditional dataset splits for training, validation, and testing.
Hardware Specification	No	No specific hardware details (such as GPU/CPU models, processor types, or memory) are provided in the paper. The paper mentions using 'GPT-4o-mini' but this refers to the language model, not the experimental hardware.
Software Dependencies	No	BORA We implemented BORA using Open AI s most cost-effective model at the time, GPT-4o-mini [Open AI, 2025], which was not fine-tuned in our effort to make BORA more accessible to users with limited resources. For the BO action implementation, the GP uses a Mat ern kernel, and the acquisition function is EI. We set q = 5,000 for σGP t,mean. The paper mentions the specific LLM model used but does not provide specific version numbers for other ancillary software components, libraries, or programming languages used in the implementation.
Experiment Setup	Yes	We set q = 5,000 for σGP t,mean. For the BO action implementation, the GP uses a Mat ern kernel, and the acquisition function is EI. ... σt,upper = 0.5 σGP t,max and σt,lower = 0.3 σGP t,max, ... γ = 0.05, n BO = 5, n LBO = 2; ... The plateau duration m is initialized at minit = 2d , set to vary between mmin = 0 and mmax = 3minit, and is automatically adjusted at every LLM intervention step l ... max is the maximum allowed adjustment per step, here set to 15 ... ϵ = 10 6 is a small constant ... W = min(\|H\|, 3). ... All methods were initialized with ninit = 5 initial samples apart from LAEA, for which we used 15 initial samples ... The maximum number of samples was set to 105 to account for realistic budgets with expensive functions.