reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Latent Bayesian Optimization via Autoregressive Normalizing Flows

Authors: Seunghun Lee, Jinyoung Park, Jaewon Chu, Minseo Yoon, Hyunwoo Kim

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, our NF-BO method demonstrates superior performance in molecule generation tasks, significantly outperforming both traditional and recent LBO approaches. ... We validate our NF-BO across various benchmarks focusing on de novo molecular design tasks. Initially, we conduct experiments on the Guacamol benchmarks (Brown et al., 2019), specifically targeting seven challenging tasks where optimal solutions are not readily found. ... Subsequently, we evaluate our method on the PMO benchmarks (Gao et al., 2022), which consists of 23 tasks, including albuterol similarity, and amlodipine MPO.
Researcher Affiliation	Academia	1Department of Computer Science and Engineering, Korea University 2School of Computing, KAIST EMAIL EMAIL
Pseudocode	Yes	For better understanding, the pseudocode for NF-BO is provided in the Appendix F. ... F PSEUDOCODE OF NF-BO This section provides the pseudocode of NF-BO frameworks on Algorithm 1.
Open Source Code	Yes	REPRODUCIBILITY STATEMENT For reproducibility, we elaborate on the overall pipeline of our work in Section 4. In our main paper and appendix, we also illustrate our overall pipeline and pseudocode for NF-BO, respectively. Code is available at https://github.com/mlvlab/NFBO.
Open Datasets	Yes	We validate our NF-BO across various benchmarks focusing on de novo molecular design tasks. Initially, we conduct experiments on the Guacamol benchmarks (Brown et al., 2019)... Subsequently, we evaluate our method on the PMO benchmarks (Gao et al., 2022)... For the Guacamol and PMO benchmarks, we pretrain using 1.27M unlabeled Guacamol and 250K ZINC datasets, respectively
Dataset Splits	Yes	For these benchmarks, we evaluate NF-BO and the baselines under three different settings, each varying the number of initial data points and the additional oracle budget: (100, 500), (10,000, 10,000), and (10,000, 70,000). ... We employ 1,000 initial data points and an additional 9,000 oracle calls following the PMO benchmarks.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper does not explicitly mention any specific software dependencies or library versions (e.g., Python, PyTorch, TensorFlow versions) used for the implementation.
Experiment Setup	Yes	We employ Thompson sampling (Eriksson et al., 2019) as the acquisition function, and our surrogate model is a sparse variational Gaussian process (Snelson & Ghahramani, 2005; Hensman et al., 2015; Matthews, 2017) enhanced with a deep kernel (Wilson et al., 2016). ... For (the batch size of trust regions, the number of query points Nq per trust region), we set these parameters to (5, 10) for the Guacamol benchmark with an additional oracle call setting of 500. For other settings, these parameters were adjusted to (10, 100). ... Table 3: Fixed parameters for all tasks and settings. Scaling factor κ in TACS: 0.1. Standard deviation σ of variational distribution q: 0.1. # of topk data for training: 1000. Coefficient of similarity loss Lsim: 1. ... This probability is defined as: ppxpiqq exppypiq{τ 1q ř j exppypjq{τ 1q where ypiq is the objective value of point i, and τ 1 is the temperature parameter set to 0.1...