Multi-objective Differentiable Neural Architecture Search

Authors: Rhea Sukthanker, Arber Zela, Benedikt Staffler, Samuel Dooley, Josif Grabocka, Frank Hutter

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments involving up to 19 hardware devices and 3 different objectives demonstrate the effectiveness and scalability of our method. Finally, we show that, without any additional costs, our method outperforms existing MOO NAS methods across a broad range of qualitatively different search spaces and datasets, including Mobile Net V3 on Image Net-1k, an encoder-decoder transformer space for machine translation and a decoder-only space for language modelling.
Researcher Affiliation Collaboration 1 University of Freiburg, 2 Bosch Center for AI, 3 Meta, 4 University of Technology Nuremberg, 5 ELLIS Institute Tübingen
Pseudocode Yes Algorithm 1: MODNAS Data: Dtrain; Dvalid; Supernetwork; device features {dt}T t=1; Meta Hypernetwork HΦ; nr. of objectives M; Architect Λ; learning rates ξ1, ξ2.
Open Source Code Yes To facilitate reproducibility, we provide our code in https://github.com/automl/modnas.
Open Datasets Yes We evaluate MODNAS on 4 search spaces: (1) NAS-Bench-201 (Dong & Yang, 2020) with 19 devices and CIFAR-10 dataset; (2) Mobile Net V3 from Once-for-All (OFA) (Cai et al., 2020) with 12 devices and Image Net-1k dataset; (3) Hardware-Aware-Transformer (HAT) (Wang et al., 2020b) on the machine translation benchmark WMT 14 En-De across 3 different hardware devices; (4) HW-GPT-Bench (Sukthanker et al., 2024) a GPT-2 based search space used for language modeling on the Open Web Text (Gokaslan & Cohen, 2019) across 8 devices.
Dataset Splits Yes Ltrain t and Lvalid t are the vectors with all M loss functions evaluated on the train and validation splits of D, used in the lowerand upper-level problems of (4), respectively.
Hardware Specification Yes We run the MODNAS search (see Appendix D for more details on the search hyperparameters), as described in Algorithm 1, for 100 epochs (22 GPU hours on a single NVidia RTX2080Ti) and show the HV in Figure 3 of the evaluated Pareto front in comparison to the baselines, for which we allocate the same search time budget across all devices equivalent to the MODNAS search + evaluation.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes In Table 2, we show the search hyperparameters and their corresponding values we use to conduct our experiments with MODNAS. For the convolutional spaces we subtract a cosine similarity penalty from the scalarized loss following (Ruchte & Grabocka, 2021):