reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities

Authors: Brian R. Bartoldson, Bhavya Kailkhura, Davis Blalock

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Davis conducted all experiments and led the creation of a guide to achieving speedups in practice. To address these fragmentation issues, we eschew a more traditional survey approach that focuses on just a single component (e.g., the model) or single action (e.g., reducing model size) in the training pipeline. Instead, we adopt a wholistic view of the speedup problem and emphasize that one needs to carefully select a combination of techniques, which we survey in Section 3, to overcome various compute-platform bottlenecks. We use experiments to illustrate the importance of such a wholistic view to achieving speedup in practice, and we provide guidance informed by the relationships between diﬀerent bottlenecks and components of training. Appendix B. Experimental Details: All models were trained on a single machine with eight A100s and two 32-core AMD EPYC 7513 processors.
Researcher Affiliation	Collaboration	Brian R. Bartoldson EMAIL Lawrence Livermore National Laboratory, USA; Bhavya Kailkhura EMAIL Lawrence Livermore National Laboratory, USA; Davis Blalock Davis@Mosaic ML.com Mosaic ML, USA
Pseudocode	No	The paper describes various algorithms and methods but does not provide structured pseudocode blocks or algorithms labeled as such. It mainly surveys existing techniques and discusses their mechanisms.
Open Source Code	No	The paper does not provide a specific link to a code repository for its own methodology or an explicit statement of code release. It mentions using existing libraries like Composer (Team, 2021) for experiments, but this refers to third-party tools rather than the authors' own implementation for the paper's novel contributions: 'We chose these methods because all had tested implementations in a common library.' The license information pertains to the paper itself, not source code.
Open Datasets	Yes	Experiments on CIFAR-10, CIFAR-100, and SVHN show that Selective-Backprop can achieve a 3.5x speedup compared to standard SGD in exchange for a decrease in accuracy. On CIFAR-10, CIFAR-100, and FOOD-101(N), they found that scoring and ordering had no eﬀect on model quality at convergence. Sorscher et al. (2022) evaluate many existing data pruning metrics, ﬁnding that 1) they perform poorly on Image Net, even if they performed well on smaller datasets like CIFAR-10. Dubois et al. (2021) provides a minimal script that trains an image encoder, encodes the STL dataset, and trains a linear classiﬁer on the resulting encodings to 98.7% accuracy in under ﬁve minutes. Gonzalez and Miikkulainen (2020) apply genetic programming to learn loss functions from primitive operations using MNIST validation dataset performance as a signal.
Dataset Splits	No	The paper mentions using well-known datasets like ImageNet, CIFAR-10, and CIFAR-100, which typically have standard splits. It also mentions
Hardware Specification	Yes	Appendix B. Experimental Details: All models were trained on a single machine with eight A100s and two 32-core AMD EPYC 7513 processors. Microbenchmarking experiments used a single A100, with means and standard deviations computed from ﬁve trials. All results use half-precision weights and activations.
Software Dependencies	No	The paper mentions using PyTorch for microbenchmarking: 'we proﬁle individual Py Torch (Paszke et al., 2017) operations on a 40GB A100'. It also refers to 'Composer (Team, 2021)' as a common library where some methods are implemented. However, specific version numbers for PyTorch, Composer, or other critical software dependencies are not provided within the text.
Experiment Setup	No	Appendix B. Experimental Details: We chose the hyperparameters for speedup methods in Figure 13 based on the hyperparameters used for these recipes. When the hyperparameters did not vary across recipes for a given speedup, we made up similar hyperparameters on a best-eﬀort basis that would allow for assessing alternate speed vs accuracy tradeoﬀs e.g., choosing diﬀerent degrees of progressive resizing. These hyperparameters may not be optimal, so it is important to conclude only that certain baselines can outperform these methods, not that they always will.