reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

User Preference Meets Pareto-Optimality in Multi-Objective Bayesian Optimization

Authors: Joshua Hang Sai Ip, Ankush Chakrabarty, Ali Mesbah, Diego Romeres

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	PUB-MOBO is tested on three synthetic benchmark problems: DTLZ1, DTLZ2 and DH1, as well as on three real-world problems: Vehicle Safety, Conceptual Marine Design, and Car Side Impact. PUB-MOBO consistently outperforms state-of-the-art competitors in terms of proximity to the Pareto-front and utility regret across all the problems. ... We empirically validate the proposed PUB-MOBO algorithm on 6 benchmark problems and report the performance of utility regret, R, and distance to Pareto-front, d Pareto, against outcome evaluations and user queries.
Researcher Affiliation	Collaboration	Joshua Hang Sai Ip1, Ankush Chakrabarty2, Ali Mesbah1, Diego Romeres2 1University of California, Berkeley 2Mitsubishi Electric Research Laboratories EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: PUB-MOBO ... Algorithm 2: LOCAL GRADIENT DESCENT
Open Source Code	No	The paper does not provide an explicit statement about the release of its source code or a link to a code repository for the methodology described.
Open Datasets	Yes	PUB-MOBO is tested on three synthetic benchmark problems: DTLZ1, DTLZ2 and DH1, as well as on three real-world problems: Vehicle Safety, Conceptual Marine Design, and Car Side Impact. ... We examine 3 synthetic problems that are commonly found in MOO literature: DTLZ1, DTLZ2 (Deb et al. 2005), and DH1 (Deb and Gupta 2005). ... We examine 3 problems based on real MOO problems: Vehicle Safety (Liao et al. 2008), Conceptual Marine Design (Parsons and Scott 2004), and Car Side Impact (Jain and Deb 2013). The implementations of these problems are taken from (Tanabe and Ishibuchi 2020).
Dataset Splits	No	The paper mentions running experiments across "20 seeds" and uses an "outcome evaluation budget" but does not specify how the datasets themselves (synthetic or real-world benchmarks) were split into training, validation, or test sets for reproduction, nor does it refer to standard splits used for these problems.
Hardware Specification	Yes	All experimental results are obtained using a 13th Gen Intel Core i7-13620H repeated across 20 seeds with hyperparameters n GD = 10, n GI = 1, ε = 0.1.
Software Dependencies	No	The paper discusses various algorithms and models such as Gaussian processes (GPs), q-Expected Hypervolume Improvement (q EHVI), EUBO, q EIUU, Multiple Gradient Descent Algorithm (MGDA), and the Frank-Wolfe algorithm, but it does not specify any particular software libraries, programming languages, or their version numbers used for implementation (e.g., 'Python 3.8' or 'PyTorch 1.9').
Experiment Setup	Yes	All experimental results are obtained using a 13th Gen Intel Core i7-13620H repeated across 20 seeds with hyperparameters n GD = 10, n GI = 1, ε = 0.1.