Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations

Authors: Yichi Zhang, Ritchie Zhao, Weizhe Hua, Nayun Xu, G. Edward Suh, Zhiru Zhang

ICLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments indicate that PG achieves excellent results on CNNs, including statically compressed mobile-friendly networks such as Shuffle Net. Compared to the state-of-the-art prediction-based quantization schemes, PG achieves the same or higher accuracy with 2.4 less compute on Image Net. PG furthermore applies to RNNs. Compared to 8-bit uniform quantization, PG obtains a 1.2% improvement in perplexity per word with 2.7 computational cost reduction on LSTM on the Penn Tree Bank dataset.
Researcher Affiliation Academia Yichi Zhang Cornell University EMAIL Ritchie Zhao Cornell University EMAIL Weizhe Hua Cornell University EMAIL Nayun Xu Cornell Tech EMAIL G. Edward Suh Cornell University EMAIL Zhiru Zhang Cornell University EMAIL
Pseudocode No No pseudocode or algorithm blocks found. The paper describes the method using equations and explanatory text.
Open Source Code No We will release the source code on the author s website.
Open Datasets Yes We evaluate PG using Res Net-20 (He et al., 2016a) and Shift Net-20 (Wu et al., 2018a) on CIFAR10 (Krizhevsky & Hinton, 2009), and Shuffle Net V2 (Ma et al., 2018) on Image Net (Deng et al., 2009). We also test an LSTM model (Hochreiter & Schmidhuber, 1997) on the Penn Tree Bank (PTB) (Marcus et al., 1993) corpus.
Dataset Splits No The paper mentions training details but does not provide explicit dataset splits for train/validation/test. For example, "On CIFAR-10, the batch size is 128, and the models are trained for 200 epochs." does not specify the splits.
Hardware Specification Yes All experiments are conducted on Tensorflow (Abadi et al., 2016) with NVIDIA Ge Force 1080Ti GPUs. One of the Titan Xp GPUs used for this research was donated by the NVIDIA Corporation. Intel Xeon Silver 4114 CPU (2.20GHz).
Software Dependencies No All experiments are conducted on Tensorflow (Abadi et al., 2016) with NVIDIA Ge Force 1080Ti GPUs. We implement the SDDMM kernel in Python leveraging a high performance JIT compiler Numba (Lam et al., 2015). No version numbers for TensorFlow or Numba are specified.
Experiment Setup Yes On CIFAR-10, the batch size is 128, and the models are trained for 200 epochs. The initial learning rate is 0.1 and decays at epoch 100, 150, 200 by a factor of 0.1 (i.e., multiply learning rate by 0.1). On Image Net, the batch size is 512 and the models are trained for 120 epochs. The learning rate decays linearly from an initial value of 0.5 to 0. The number of hidden units in the LSTM cell is set to 300, and the number of layers is set to 1. The full bitwidth B... The prediction bitwidth Bhb... The penalty factor σ... The gating target δ... The coefficient α in the backward pass...