Resource-Efficient Neural Networks for Embedded Systems
Authors: Wolfgang Roth, Günther Schindler, Bernhard Klein, Robert Peharz, Sebastian Tschiatschek, Holger Fröning, Franz Pernkopf, Zoubin Ghahramani
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We substantiate our discussion with experiments on well-known benchmark data sets using compression techniques (quantization, pruning) for a set of resource-constrained embedded systems, such as CPUs, GPUs and FPGAs. The obtained results highlight the difficulty of finding good trade-offs between resource efficiency and prediction quality. |
| Researcher Affiliation | Academia | Wolfgang Roth EMAIL Graz University of Technology, Austria G unther Schindler EMAIL Heidelberg University, Germany Sebastian Tschiatschek EMAIL University of Vienna, Austria Zoubin Ghahramani EMAIL University of Cambridge, UK |
| Pseudocode | No | The paper does not contain any explicit pseudocode or algorithm blocks. Figure 2 illustrates a simplified building block of a DNN but is a diagram, not pseudocode. |
| Open Source Code | No | The paper mentions using the FINN framework (Umuroglu et al., 2017) and Tensor RT framework, but there is no explicit statement or link indicating that the authors have released their own source code for the methodology described in this paper. |
| Open Datasets | Yes | We provide a comparison of various quantization approaches for DNNs using the CIFAR-100 data set in Section 5.1.1, followed by an evaluation of prediction quality for different types of pruned structures on the CIFAR-10 data set in Section 5.1.2. ... Image Net classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS), pages 1106 1114, 2012. |
| Dataset Splits | Yes | We conduct our experiments on the CIFAR-100 data set where the task is to classify RGB images of size 32x32 pixels to one of 100 object categories. The CIFAR-100 data set is split into 50,000 training images and 10,000 test images. ... CIFAR-10 is similar to CIFAR-100 used in the previous section (i.e., image size and size of training and test sets are equal) except that it contains only ten object classes. |
| Hardware Specification | Yes | We evaluate the inference throughput of the compressed models on an ARM CPU (Section 5.2.1), Xilinx FPGA (Section 5.2.2) and an embedded NVIDIA GPU (Section 5.2.3). ... ARM Cortex-A53 CPU. ... XILINX Ultra96 FPGA. ... Jetson Nano GPU. |
| Software Dependencies | No | The paper mentions software like Tensor RT framework and FINN framework, but it does not specify any version numbers for these or other software libraries or dependencies. For example, it states 'Tensor RT framework targeting a Jetson Nano GPU' but no version. |
| Experiment Setup | Yes | We use a Dense Net architecture (Huang et al., 2017) consisting of 100 layers with bottleneck and compression layers, i.e., a Dense Net-BC-100. We select the default growth rate of k = 12 for the model... We selected some of the most popular quantization approaches (see Section 3.1) for our comparison... For this experiment, we quantize the DNNs in three different modes: (i) weight-only, (ii) activation-only, and (iii) combined weight and activation quantization. ... We use wide residual networks (WRNs) by Zagoruyko and Komodakis (2016) with a depth of 28 layers... |