reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Density Ratio Estimation-based Bayesian Optimization with Semi-Supervised Learning

Authors: Jungtaek Kim

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we show the empirical results of our methods and several baseline methods in two distinct scenarios with unlabeled point sampling and a fixed-size pool, and analyze the validity of our methods in diverse experiments.
Researcher Affiliation	Academia	1University of Wisconsin Madison, Madison, WI 53706, USA. Correspondence to: Jungtaek Kim <EMAIL>.
Pseudocode	Yes	Algorithm 1 DRE-BO-SSL Algorithm 2 Labeling Unlabeled Data
Open Source Code	No	The paper mentions that its methods are implemented using various libraries, including Bayes O (Kim & Choi, 2023), which is an open-source framework for Bayesian optimization. However, it does not explicitly state that the specific implementation code for the methodology described in this paper (DRE-BO-SSL) is released or provide a direct link to it.
Open Datasets	Yes	We run several synthetic functions for our methods and the baseline methods. Tabular Benchmarks (Klein & Hutter, 2019) NATS-Bench (Dong et al., 2021) 64D minimum multi-digit MNIST search... in the MNIST dataset (Le Cun et al., 1998).
Dataset Splits	Yes	To train and test the model fairly, we create a training dataset of 440,000 three-digit images, a validation dataset of 40,000 three-digit images, and a test dataset of 80,000 three-digit images using a training dataset of 55,000 single-digit images, a validation dataset of 5,000 single-digit images, and a test dataset of 10,000 single-digit images in the original MNIST dataset (Le Cun et al., 1998).
Hardware Specification	Yes	To carry out the experiments in our work, we use dozens of commercial Intel and AMD CPUs such as Intel Xeon Gold 6126 and AMD EPYC 7302. For the experiments on minimum multi-digit MNIST search, the NVIDIA GeForce RTX 3090 GPU is used.
Software Dependencies	No	Our proposed methods and baseline methods are implemented with scikit-learn (Pedregosa et al., 2011), PyTorch (Paszke et al., 2019), NumPy (Harris et al., 2020), SciPy (Virtanen et al., 2020), XGBoost (Chen & Guestrin, 2016), and Bayes O (Kim & Choi, 2023). While these software components are listed with citations, specific version numbers for the libraries/packages used in the experiments are not provided.
Experiment Setup	Yes	All experiments are repeated 20 times with 20 fixed random seeds, where 5 initial points are given to each experiment. By following the previous work by Tiao et al. (2021); Song et al. (2022), we set a threshold ratio as ζ = 0.33 for all experiments. To solve (6), we use L-BFGS-B (Byrd et al., 1995) with 1,000 different initializations. MLP, BORE and LFBO: These methods are built with two-layer fully-connected networks; the detail of the multi-layer perceptron is described as follows. The architecture of the multi-layer perceptron is set as the following: (i) first layer: fully-connected, input dimensionality d, output dimensionality 32, ReLU; (ii) second layer: fully-connected, input dimensionality 32, output dimensionality 1, Logistic. The Adam optimizer (Kingma & Ba, 2015) with learning rate 1 × 10−3 is used to train the network for 100 epochs.