reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

How Much Pre-training Is Enough to Discover a Good Subnetwork?

Authors: Cameron R. Wolfe, Fangshuo Liao, Qihan Wang, Junhyung Lyle Kim, Anastasios Kyrillidis

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Lastly, we empirically validate our theoretical results on multi-layer perceptions and residual-based convolutional networks trained on MNIST, CIFAR, and Image Net datasets.
Researcher Affiliation	Academia	Cameron R. Wolfe EMAIL Department of Computer Science Rice University Fangshuo Liao EMAIL Department of Computer Science Rice University Qihan Wang EMAIL Department of Computer Science Rice University Junhyung Lyle Kim EMAIL Department of Computer Science Rice University Anastasios Kyrillidis EMAIL Department of Computer Science Rice University
Pseudocode	Yes	Algorithm 1 Greedy Forward Selection Algorithm 2 Greedy Forward Selection for Deep CNN Algorithm 3 Distributed Greedy Forward Selection for Two-Layer Networks
Open Source Code	No	The paper mentions using a 'public implementation of greedy forward selection (Ye, 2021)' but does not state that the authors of this paper are releasing their own code or specific adaptations for the methodology described in this work. The reference is to a third-party's public code.
Open Datasets	Yes	Lastly, we empirically validate our theoretical results on multi-layer perceptions and residual-based convolutional networks trained on MNIST, CIFAR, and Image Net datasets. Experiments are run on an internal cluster with two Nvidia RTX 3090 GPUs using the public implementation of greedy forward selection (Ye, 2021). We perform structured pruning experiments with two-layer networks on MNIST (Deng, 2012) by pruning hidden neurons via greedy forward selection. We perform structured pruning experiments (i.e., channel-based pruning) using Res Net34 (He et al., 2015) and Mobile Net V2 (Sandler et al., 2018) architectures on CIFAR10 and Image Net (Krizhevsky et al., 2009; Deng et al., 2009).
Dataset Splits	Yes	To study how dataset size aﬀects subnetwork performance, we construct sub-datasets of sizes 1K to 50K (i.e., in increments of 5K) from the original MNIST dataset by uniformly sampling examples from the ten original classes. Three CIFAR10 sub-datasets of size 10K, 30K, and 50K (i.e., full dataset) are created using uniform sampling across classes. This grid search is performed using a validation set on CIFAR10, constructed using a random 80-20 split on the training dataset.
Hardware Specification	Yes	Experiments are run on an internal cluster with two Nvidia RTX 3090 GPUs using the public implementation of greedy forward selection (Ye, 2021).
Software Dependencies	No	The paper mentions using a 'public implementation of greedy forward selection (Ye, 2021)' and adopts 'settings of a widely used, open-source repository [Pytorch-cifar. https://github.com/kuangliu/pytorch-cifar, 2017.]' but does not specify version numbers for any software components like Python, PyTorch, or other libraries.
Experiment Setup	Yes	The two-layer network is pre-trained for 8K iterations in total and pruned every 1K iterations to a size of 200 hidden nodes. Pre-training is conducted for 80K iterations using SGD with momentum and a cosine learning rate decay schedule starting at 0.1. We use a batch size of 128 and weight decay of 5e-4. The dense model is independently pruned every 20K iterations, and subnetworks are ﬁne-tuned for 2500 iterations with an initial learning rate of 0.01 before being evaluated. We adopt ε = 0.02 and ε = 0.05 for Mobile Net-V2 and Res Net34, respectively... Models are pre-trained for 150 epochs using SGD with momentum and cosine learning rate decay with an initial value of 0.1. We use a batch size of 128 and weight decay of 5e-4. The dense network is independently pruned every 50 epochs, and the subnetwork is ﬁne-tuned for 80 epochs using a cosine learning rates schedule with an initial value of 0.01 before being evaluated.