reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Robust Guided Diffusion for Offline Black-Box Optimization

Authors: Can Chen, Christopher Beckham, Zixuan Liu, Xue Liu, Christopher Pal

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	RGD achieves state-of-the-art results on various design-bench tasks, underscoring its efficacy. Our code is here. In this section, we conduct comprehensive experiments to evaluate our method s performance. In Tables 1 and 2, we showcase our experimental results for both continuous and discrete tasks. In this section, we present a series of ablation studies to scrutinize the individual contributions of distinct components in our methodology.
Researcher Affiliation	Academia	1Mc Gill University, 2MILA Quebec AI Institute, 3Polytechnique Montreal, 4Canada CIFAR AI Chair, 5University of Washington
Pseudocode	Yes	Algorithm 1 Robust Guided Diffusion for Offline BBO
Open Source Code	Yes	Our code is here. In Appendix D of Song et al. (2021). The implementation can be accessed here. Our code, available here, implements this process as follows:
Open Datasets	Yes	The continuous category includes four tasks: (1) Superconductor (Super C): The objective here is to engineer a superconductor composed of 86 continuous elements. The goal is to enhance the critical temperature using 17, 010 design samples. This task is based on the dataset from Hamidieh (2018). (2) Ant Morphology (Ant): In this task, the focus is on developing a quadrupedal ant robot, comprising 60 continuous parts, to augment its crawling velocity. It uses 10, 004 design instances from the dataset in Trabucco et al. (2022); Brockman et al. (2016). (3) D Kitty Morphology (D Kitty): Similar to Ant Morphology, this task involves the design of a quadrupedal D Kitty robot with 56 components, aiming to improve its crawling speed with 10, 004 designs, as described in Trabucco et al. (2022); Ahn et al. (2020). (4) Rosenbrock (Rosen): The aim of this task is to optimize a 60-dimension continuous vector to maximize the Rosenbrock black-box function. It uses 50000 designs from the low-scoring part (Rosenbrock, 1960). For the discrete category, we explore three tasks: (1) TF Bind 8 (TF8): The goal is to identify an 8-unit DNA sequence that maximizes binding activity. This task uses 32, 898 designs and is detailed in Barrera et al. (2016). (2) TF Bind 10 (TF10): Similar to TF8, but with a 10-unit DNA sequence and a larger pool of 50, 000 samples, as described in (Barrera et al., 2016). (3) Neural Architecture Search (NAS): This task focuses on discovering the optimal neural network architecture to improve test accuracy on the CIFAR-10 dataset, using 1, 771 designs (Zoph & Le, 2017).
Dataset Splits	Yes	Within this context, Dv represents the validation dataset sampled from the offline dataset. The inner optimization task, which seeks the optimal ϕ (α), is efficiently approximated via first-order gradient descent methods. We use batch optimization, with each batch containing 256 training samples and 256 validation samples. The bi-level optimization process updates the hyperparameter with a single iteration for both the inner and outer levels.
Hardware Specification	Yes	These experiments were conducted using a NVIDIA Ge Force V100 GPU.
Software Dependencies	No	The paper mentions "Py Torch (Paszke et al., 2019)" but does not specify a version number for it or any other software.
Experiment Setup	Yes	In alignment with the experimental protocols established in Trabucco et al. (2022); Chen et al. (2022b), we have tailored our training methodologies for all approaches, utilizing a three-layer MLP architecture for all involved proxies. We adopted T = 1000 diffusion sampling steps, set the condition y to ymax, and initial strength ω as 2 in line with Krishnamoorthy et al. (2023). To ensure reliability and consistency in our comparative analysis, each experimental setting was replicated across 8 independent runs, unless stated otherwise, with the presentation of both mean values and standard deviations.