reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds

Authors: Aya Kayal, Sattar Vakili, Laura Toni, Da-Shan Shiu, Alberto Bernacchia

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present experimental results on the performance of MRLPF on synthetic functions that closely align with the analytical assumptions, as well as on a dataset of Yelp reviews, demonstrating the utility of the proposed algorithm in realworld applications (Section 5).
Researcher Affiliation	Collaboration	Aya Kayal s work was part of her research placement at Media Tek Research. 1University College London, UK 2Media Tek Research. Correspondence to: Aya Kayal <EMAIL>, Sattar Vakili <EMAIL>.
Pseudocode	Yes	A pseudocode is provided in Algorithm 1.
Open Source Code	Yes	Our implementation is publicly available.3https://github.com/ayakayal/BOHF_code_ submission
Open Datasets	Yes	To showcase the utility of our approach in real-world applications, we experimented using the Yelp Open Dataset4 of restaurant reviews.
Dataset Splits	No	The paper describes the processing of the Yelp Dataset, including concatenating reviews, generating vector embeddings, scaling user ratings, and handling missing ratings using collaborative filtering. It also mentions sampling a random user for each experimental run. However, it does not explicitly provide details about training, validation, or test splits (e.g., percentages or sample counts) for the datasets used in the experiments.
Hardware Specification	Yes	The code is executed on a cluster with 376.2 Gi B of RAM and an Intel(R) Xeon(R) Gold 5118 CPU running at 2.30 GHz. In the case of the Yelp Dataset experiments, ... The simulations are carried out on a computing node equipped with an NVIDIA Ge Force RTX 2080 Ti GPU featuring 11 GB of VRAM, an Intel(R) Xeon(R) Gold 5118 CPU running at 2.40 GHz with 24 cores, and 92 GB of RAM.
Software Dependencies	No	For the experiments with the synthetic RKHS and Ackley functions, we utilize the Scikit Learn library (Pedregosa et al., 2011) for implementing Gaussian Process (GP) regression. ... we use the Bo Torch library (Balandat et al., 2020) and its dependencies, including GPy Torch (Gardner et al., 2018), which offer efficient GP regression tools with GPU support. ... Open AI s TEXT-EMBEDDING-3-LARGE model... While software libraries like Scikit Learn, Bo Torch, and GPy Torch, and the OpenAI model are mentioned, specific version numbers for these software components are not provided.
Experiment Setup	Yes	We choose l = 0.1 as the length scale and λ = 0.05 as the kernel-based learning parameter across all cases. The horizon T is set to 300 for RKHS test functions and 2000 for the Ackley function and the Yelp Dataset. For the RKHS and Ackley functions, the confidence interval width β is fixed at 1 for both MR-LPF and Max Min LCB. For the Yelp dataset, we conduct a grid search to tune β over {0.01, 0.1, 0.5, 1, 2} for both MR-LPF and Max Min LCB algorithms. We determine β = 2 as optimal for Max Min LCB and β = 0.1 for MR-LPF.