reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Variation Matters: from Mitigating to Embracing Zero-Shot NAS Ranking Function Variation

Authors: Pavel Rumiantsev, Mark Coates

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that the proposed stochastic ordering can effectively boost performance of a search on standard benchmark search spaces. ... We present results for the NAS-Bench search spaces in Fig. 1a and results for the Trans NAS search spaces in Fig. 1b. ... Results are presented in Table 1.
Researcher Affiliation	Academia	Pavel Rumiantsev EMAIL The Department of Electrical and Computer Engineering Mc Gill University Mark Coates EMAIL The Department of Electrical and Computer Engineering Mc Gill University
Pseudocode	Yes	Algorithm 1 Statistical MAX and TOP-K pseudocode ... Algorithm 2 Regularised evolutionary algorithm (REA) ... Algorithm 3 Greedy evolutionary search algorithm difference with Algorithm 2 ... Algorithm 4 Free regularised evolutionary algorithm (Free REA) difference with Algorithm 2
Open Source Code	No	The paper lists third-party software (automl/NASLib, Py Torch, Num Py, Sci Py) and datasets with their licenses and citations, but does not provide specific access (link or explicit statement) to the authors' own implementation code for the methodology described in the paper.
Open Datasets	Yes	The following dataset were used: CIFAR-10/100 (Krizhevsky, 2009) under CC BY 4.0 Licence Image Net-16-120 (Chrabaszcz et al., 2017) under CC BY 4.0 Licence Nina Pro (Atzori et al., 2012) under CC BY-ND Licence Five datasets from Taskonomy collection (Zamir et al., 2018) under CC BY 4.0 Licence
Dataset Splits	Yes	For this work, we view a search space as a combination of a feasible architecture set and a dataset of labelled training, validation, and test samples. ... standard architectural search spaces including NAS-Bench-101, NAS-Bench-201, and Trans NAS-Bench-101. ... NAS-Bench-101 Ying et al. (2019) includes the performance and training statistic of the 423k architectures on CIFAR-10. ... NAS-Bench-201 (Dong & Yang, 2019) has 15625 architectures in the search space and provides performance for CIFAR-10, CIFAR-100, and Image Net-16-120.
Hardware Specification	No	The paper does not explicitly mention any specific hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	In this work, we used the following software: automl/NASLib (Mehta et al., 2022) available on Git Hub under Apache 2.0 Licence Py Torch (Paszke et al., 2019) available via Py PI under a custom BSD licence Num Py (Harris et al., 2020) available via Py PI under a custom BSD licence Sci Py (Virtanen et al., 2020) available via Py PI under a custom BSD licence. While software names are listed, specific version numbers are not provided for PyTorch, NumPy, or SciPy.
Experiment Setup	Yes	We set V = 10 and set B = 64. ... The 5% significance level is used for rejecting the null hypothesis when conducting the Mann-Whitney U-test. ... We repeat each experiment 100 times, computing the mean value of the accuracy for the selected architecture and its variation. ... In the case of averaging, we cache the average of the ranking function output over 10 evaluations. ... We use an evaluation budget and cap it at 1000 evolution cycles. ... The optimal threshold lies within the range from 0.025 to 0.075.