reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multi-conditioned Graph Diffusion for Neural Architecture Search

Authors: Rohan Asthana, Joschua Conrad, Youssef Dawoud, Maurits Ortmanns, Vasileios Belagiannis

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our evaluations, we show promising results on six standard benchmarks, yielding novel and unique architectures at a fast speed, i.e. less than 0.2 seconds per architecture. Furthermore, we demonstrate the generalisability and efficiency of our method through experiments on Image Net dataset.
Researcher Affiliation	Academia	Rohan Asthana EMAIL Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen, Germany Joschua Conrad EMAIL Universität Ulm Ulm, Germany Youssef Dawoud EMAIL Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen, Germany Maurits Ortmanns EMAIL Universität Ulm Ulm, Germany Vasileios Belagiannis EMAIL Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen, Germany
Pseudocode	Yes	Algorithm 1 Training Di NAS Algorithm 2 Sampling from Di NAS
Open Source Code	Yes	1The code for our paper is available at https://github.com/rohanasthana/Di NAS.
Open Datasets	Yes	We evaluate our approach on six standard benchmarks encompassing tabular, surrogate, hardware aware benchmarks, and the challenging Image Net image classification task (Deng et al., 2009). Tabular Benchmarks We first consider the tabular benchmarks NAS-Bench-101 (Ying et al., 2019) and NAS-Bench-201 (Dong & Yang, 2020) for our experiments. We perform our experiments on two surrogate benchmarks, the NAS-Bench-301 (Siems et al., 2021) (trained on CIFAR-10 (Krizhevsky et al., 2009)) on DARTS search space and NAS-Bench-NLP. Our next evaluation is on the Hardware Aware Benchmark (HW-NAS-Bench) (Li et al., 2021). NAS-Bench-NLP provides 14,322 architectures trained on Penn Tree Bank dataset (Marcus et al., 1993).
Dataset Splits	Yes	The evaluation protocol 2 follows the established standard (Yan et al., 2020; Wu et al., 2021) of conducting a search for the maximum validation accuracy within a fixed number of queries and reporting the corresponding test accuracy, both as a mean over 10 runs. NAS-Bench-101 (Ying et al., 2019) is a cell-based tabular benchmark, comprising a large collection of 423,624 distinct architectures represented as cells. These architectures are also mapped to their respective validation and test accuracy metrics, evaluated on CIFAR-10 image classification task. This involves training and evaluating the best generated architecture from NASBench301 (trained on CIFAR10 image classification task) on the Image Net dataset.
Hardware Specification	Yes	i.e. less than 0.2 seconds per architecture. All the models were trained for 100 epochs with a learning rate of 0.0002, batch-size of 16, weight decay of 10 12, guidance scale of -4 and using Adam W optimiser (Loshchilov & Hutter, 2017) on a single NVIDIA A6000 GPU. ... The Image Net training is performed on 3 NVIDIA V100 GPUs parallelly in a distributed manner.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers. It mentions using Adam W optimiser (Loshchilov & Hutter, 2017) and XGBoost (Chen & Guestrin, 2016), which are algorithms/models, but not versioned software libraries or environments. It also refers to using the training pipeline and code from other papers without specifying versions of the underlying software components used in their own implementation.
Experiment Setup	Yes	All the models were trained for 100 epochs with a learning rate of 0.0002, batch-size of 16, weight decay of 10 12, guidance scale of -4 and using Adam W optimiser (Loshchilov & Hutter, 2017) on a single NVIDIA A6000 GPU. For noising, we use cosine noise schedule for T = 500 time-steps. ... For the evaluation on Image Net, we employ the same training pipeline and code as AG-Net (Lukasik et al., 2022) and TENAS (Chen et al., 2021a), taken from Chen (2022). We train the best generated architecture in terms of validation accuracy from NAS-Bench-301 on Image Net for 250 epochs. The initial learning rate is set to 0.5 with a cosine learning rate scheduler and the batch size is set to 1024.