Exploring validation metrics for offline model-based optimisation with diffusion models
Authors: Christopher Beckham, Alexandre Piché, David Vazquez, Christopher Pal
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Exploring validation metrics for offline model-based optimisation with diffusion models ... This involves proposing validation metrics and quantifying them over many datasets for which the ground truth is known, for instance simulated environments. ... we specifically evaluate denoising diffusion models ... we run a large scale study over different hyperparameters and rank these validation metrics by their correlation with the ground truth. 4 Experiments and Discussion |
| Researcher Affiliation | Collaboration | Christopher Beckham firstEMAIL Mila, Polytechnique Montréal, Service Now Research Alexandre Piché firstEMAIL Mila, Université de Montréal, Service Now Research David Vazquez firstEMAIL Service Now Research Christopher Pal firstEMAIL Mila, Polytechnique Montréal, Service Now Research, CIFAR AI Chair |
| Pseudocode | Yes | Algorithm 1 Training algorithm, with early implicit early stopping. Algorithm 2 Final evaluation algorithm. |
| Open Source Code | Yes | 1Corresponding code can be found here: https://github.com/christopher-beckham/validation-metrics-offline-mbo |
| Open Datasets | Yes | We explore five validation metrics in our work against four datasets in the Design Bench (Trabucco et al., 2022) framework, motivating their use as well as describing their advantages and disadvantages. Dataset Our codebase is built on top of the Design Bench (Trabucco et al., 2022) framework. We consider all continuous datasets in Design Bench datasets: Ant Morphology, D Kitty Morphology, Superconductor, and Hopper. |
| Dataset Splits | Yes | Table 2: Summary of datasets used in this work. ... One nuance with the Hopper dataset is that the full dataset D and the training set Dtrain are equivalent... To address this, we compute the median y with respect to Dtrain, and take the lower half as Dtrain and the upper half as Dvalid. ... Note that if a ground truth oracle exists, there is no need to define a Dtest, and this is the case for all datasets except Superconductor (Figure 3b). Otherwise, for Superconductor a random 50% subsample of (Dtrain \ D) is assigned to Dvalid (Figure 3c)... |
| Hardware Specification | Yes | Experiments are trained for 5000 epochs with single P-100 GPUs. |
| Software Dependencies | No | The paper mentions the ADAM optimiser (Kingma & Ba, 2014) and Hugging Face's annotated diffusion model, but it does not specify version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | For all experiments we train with the ADAM optimiser (Kingma & Ba, 2014), with a learning rate of 2 10 5, β = (0.0, 0.9), and diffusion timesteps T = 200. Experiments are trained for 5000 epochs... Here we list hyperparameters that differ between experiments: diffusion_kwargs.tau... gen_kwargs.dim... diffusion_kwargs.w_cg... Hyperparameters explored for classifier-free guidance { diffusion_kwargs.tau : {0.05, 0.1, 0.2, 0.4, 0.5}, gen_kwargs.dim : {128, 256} } Hyperparameters explored for classifier guidance { diffusion_kwargs.w_cg : {1.0, 10.0, 100.0}, epochs : {5000, 10000}, gen_kwargs.dim : {128, 256} } |