reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Design Editing for Offline Model-based Optimization

Authors: Ye Yuan, Youyuan Zhang, Can Chen, Haolun Wu, Melody Zixuan Li, Jianmo Li, James J. Clark, Xue Liu

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations on seven offline MBO tasks show that, with properly tuned hyperparameters, DEMO s score is competitive with the best previously reported scores in the literature. The source code is provided here. Experiments on the design-bench dataset show that, with properly tuned hyperparamters, DEMO s score is competitive with the best previously reported scores in the literature.
Researcher Affiliation	Academia	Ye Yuan EMAIL Mc Gill University, Mila Quebec AI Institute Youyuan Zhang EMAIL Mc Gill University Can (Sam) Chen EMAIL Mc Gill University, Mila Quebec AI Institute Haolun Wu EMAIL Mc Gill University, Mila Quebec AI Institute Zixuan (Melody) Li EMAIL Mc Gill University, Mila Quebec AI Institute Jianmo Li EMAIL Mc Gill University James J. Clark EMAIL Mc Gill University, Mila Quebec AI Institute Xue Liu EMAIL Mc Gill University, Mila Quebec AI Institute
Pseudocode	Yes	Algorithm 1 Design Editing for Offline Model-based Optimization Input: Offline dataset D = {(xi, yi)}N i=1, and a time m. Output: K candidate optimal designs.
Open Source Code	No	The source code is provided here.
Open Datasets	Yes	We carry out experiments on 7 tasks selected from Design-Bench (Trabucco et al., 2022) and Bayes O Benchmarks (Kim, 2023), including 4 continuous tasks and 3 discrete tasks. (vii) NAS (Zoph & Le, 2017), where the aim is to discover the optimal neural network architecture to improve test accuracy on the CIFAR-10 dataset (Hinton et al., 2012), using 1, 771 designs.
Dataset Splits	No	The paper mentions training a surrogate model using an offline dataset (D = {(xi, yi)}N i=1) but does not specify how this dataset is split into training, validation, or test sets. It only mentions generating 128 new designs for evaluation (K=128) which refers to the output, not the dataset split for model training.
Hardware Specification	Yes	All experiments are conducted on a workstation with a single Intel Xeon Platinum 8160T CPU and a single NVIDIA Tesla V100 GPU, with execution times per trial ranging from 10 minutes to 20 hours (including evaluation time), depending on the specific tasks.
Software Dependencies	No	The paper mentions using "The Adam optimizer (Kingma & Ba, 2015)" but does not specify any software libraries (e.g., PyTorch, TensorFlow) or their version numbers, nor the Python version used. Adam is an algorithm, not a software library with a version.
Experiment Setup	Yes	A 3-layer MLP with Re LU activation is used for both fθ( ) and sϕ( ), with a hidden layer size of 2048. In Algorithm 1, the iteration count, T, is established at 100 for both continuous and discrete tasks. The Adam optimizer (Kingma & Ba, 2015) is utilized to train the surrogate models over 200 epochs with a batch size of 128, and a learning rate set at 10 1. The step size, η, in Eq. (3) is configured at 10 3 for continuous tasks and 10 1 for discrete tasks. The diffusion model, sϕ( ), undergoes training for 200 epochs with a batch size of 128. For the design editing process, following precedents set by previous studies (Krishnamoorthy et al., 2023), we set M at 1000. The selected value of m is 600, with further elaboration provided in Appendix A.2.