Multi-conditioned Graph Diffusion for Neural Architecture Search

Authors: Rohan Asthana, Joschua Conrad, Youssef Dawoud, Maurits Ortmanns, Vasileios Belagiannis

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our evaluations, we show promising results on six standard benchmarks, yielding novel and unique architectures at a fast speed, i.e. less than 0.2 seconds per architecture. Furthermore, we demonstrate the generalisability and efficiency of our method through experiments on Image Net dataset.
Researcher Affiliation Academia Rohan Asthana EMAIL Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen, Germany Joschua Conrad EMAIL Universität Ulm Ulm, Germany Youssef Dawoud EMAIL Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen, Germany Maurits Ortmanns EMAIL Universität Ulm Ulm, Germany Vasileios Belagiannis EMAIL Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen, Germany
Pseudocode Yes Algorithm 1 Training Di NAS Algorithm 2 Sampling from Di NAS
Open Source Code Yes 1The code for our paper is available at https://github.com/rohanasthana/Di NAS.
Open Datasets Yes We evaluate our approach on six standard benchmarks encompassing tabular, surrogate, hardware aware benchmarks, and the challenging Image Net image classification task (Deng et al., 2009). Tabular Benchmarks We first consider the tabular benchmarks NAS-Bench-101 (Ying et al., 2019) and NAS-Bench-201 (Dong & Yang, 2020) for our experiments. We perform our experiments on two surrogate benchmarks, the NAS-Bench-301 (Siems et al., 2021) (trained on CIFAR-10 (Krizhevsky et al., 2009)) on DARTS search space and NAS-Bench-NLP. Our next evaluation is on the Hardware Aware Benchmark (HW-NAS-Bench) (Li et al., 2021). NAS-Bench-NLP provides 14,322 architectures trained on Penn Tree Bank dataset (Marcus et al., 1993).
Dataset Splits Yes The evaluation protocol 2 follows the established standard (Yan et al., 2020; Wu et al., 2021) of conducting a search for the maximum validation accuracy within a fixed number of queries and reporting the corresponding test accuracy, both as a mean over 10 runs. NAS-Bench-101 (Ying et al., 2019) is a cell-based tabular benchmark, comprising a large collection of 423,624 distinct architectures represented as cells. These architectures are also mapped to their respective validation and test accuracy metrics, evaluated on CIFAR-10 image classification task. This involves training and evaluating the best generated architecture from NASBench301 (trained on CIFAR10 image classification task) on the Image Net dataset.
Hardware Specification Yes i.e. less than 0.2 seconds per architecture. All the models were trained for 100 epochs with a learning rate of 0.0002, batch-size of 16, weight decay of 10 12, guidance scale of -4 and using Adam W optimiser (Loshchilov & Hutter, 2017) on a single NVIDIA A6000 GPU. ... The Image Net training is performed on 3 NVIDIA V100 GPUs parallelly in a distributed manner.
Software Dependencies No The paper does not provide specific software dependencies with version numbers. It mentions using Adam W optimiser (Loshchilov & Hutter, 2017) and XGBoost (Chen & Guestrin, 2016), which are algorithms/models, but not versioned software libraries or environments. It also refers to using the training pipeline and code from other papers without specifying versions of the underlying software components used in their own implementation.
Experiment Setup Yes All the models were trained for 100 epochs with a learning rate of 0.0002, batch-size of 16, weight decay of 10 12, guidance scale of -4 and using Adam W optimiser (Loshchilov & Hutter, 2017) on a single NVIDIA A6000 GPU. For noising, we use cosine noise schedule for T = 500 time-steps. ... For the evaluation on Image Net, we employ the same training pipeline and code as AG-Net (Lukasik et al., 2022) and TENAS (Chen et al., 2021a), taken from Chen (2022). We train the best generated architecture in terms of validation accuracy from NAS-Bench-301 on Image Net for 250 epochs. The initial learning rate is set to 0.5 with a cosine learning rate scheduler and the batch size is set to 1024.