Multi-objective Differentiable Neural Architecture Search
Authors: Rhea Sukthanker, Arber Zela, Benedikt Staffler, Samuel Dooley, Josif Grabocka, Frank Hutter
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments involving up to 19 hardware devices and 3 different objectives demonstrate the effectiveness and scalability of our method. Finally, we show that, without any additional costs, our method outperforms existing MOO NAS methods across a broad range of qualitatively different search spaces and datasets, including Mobile Net V3 on Image Net-1k, an encoder-decoder transformer space for machine translation and a decoder-only space for language modelling. |
| Researcher Affiliation | Collaboration | 1 University of Freiburg, 2 Bosch Center for AI, 3 Meta, 4 University of Technology Nuremberg, 5 ELLIS Institute Tübingen |
| Pseudocode | Yes | Algorithm 1: MODNAS Data: Dtrain; Dvalid; Supernetwork; device features {dt}T t=1; Meta Hypernetwork HΦ; nr. of objectives M; Architect Λ; learning rates ξ1, ξ2. |
| Open Source Code | Yes | To facilitate reproducibility, we provide our code in https://github.com/automl/modnas. |
| Open Datasets | Yes | We evaluate MODNAS on 4 search spaces: (1) NAS-Bench-201 (Dong & Yang, 2020) with 19 devices and CIFAR-10 dataset; (2) Mobile Net V3 from Once-for-All (OFA) (Cai et al., 2020) with 12 devices and Image Net-1k dataset; (3) Hardware-Aware-Transformer (HAT) (Wang et al., 2020b) on the machine translation benchmark WMT 14 En-De across 3 different hardware devices; (4) HW-GPT-Bench (Sukthanker et al., 2024) a GPT-2 based search space used for language modeling on the Open Web Text (Gokaslan & Cohen, 2019) across 8 devices. |
| Dataset Splits | Yes | Ltrain t and Lvalid t are the vectors with all M loss functions evaluated on the train and validation splits of D, used in the lowerand upper-level problems of (4), respectively. |
| Hardware Specification | Yes | We run the MODNAS search (see Appendix D for more details on the search hyperparameters), as described in Algorithm 1, for 100 epochs (22 GPU hours on a single NVidia RTX2080Ti) and show the HV in Figure 3 of the evaluated Pareto front in comparison to the baselines, for which we allocate the same search time budget across all devices equivalent to the MODNAS search + evaluation. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | In Table 2, we show the search hyperparameters and their corresponding values we use to conduct our experiments with MODNAS. For the convolutional spaces we subtract a cosine similarity penalty from the scalarized loss following (Ruchte & Grabocka, 2021): |