CAT: Compression-Aware Training for bandwidth reduction
Authors: Chaim Baskin, Brian Chmiel, Evgenii Zheltonozhskii, Ron Banner, Alex M. Bronstein, Avi Mendelson
JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | CAT significantly improves the state-of-the-art results reported for quantization evaluated on various vision and NLP tasks, such as image classification (Image Net), image detection (Pascal VOC), sentiment analysis (Co La) , and textual entailment (MNLI). For example, on Res Net-18, we achieve near baseline Image Net accuracy with an average representation of only 1.5 bits per value with 5-bit quantization. Moreover, we show that entropy reduction of weights and activations can be applied together, further improving bandwidth reduction. Reference implementation is available. |
| Researcher Affiliation | Collaboration | Chaim Baskin EMAIL Technion Israel Institute of Technology, Haifa, Israel Brian Chmiel EMAIL Habana Labs An Intel company, Caesarea, Israel Evgenii Zheltonozhskii EMAIL Technion Israel Institute of Technology, Haifa, Israel |
| Pseudocode | No | The paper describes the proposed methods, such as 'Differentiable Entropy-Reducing Loss' in Section 3.2, using mathematical formulations and descriptive text. However, it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures. |
| Open Source Code | Yes | Reference implementation is available. |
| Open Datasets | Yes | We evaluate the proposed scheme on common CNN architectures for image classification (Res Net-18/34/50, Mobile Net V2) on Image Net (Russakovsky et al., 2015) , object detection (SSD5121, Liu et al., 2016) on Pascal VOC (Everingham et al., 2010) as well as Transformers (BERT, Devlin et al., 2019) for sentimental analysis on Co LA (Warstadt et al., 2019) and for textual entailment on MNLI (Williams et al., 2018). |
| Dataset Splits | No | The paper mentions using well-known datasets like ImageNet, Pascal VOC, CoLa, and MNLI for evaluation. While it details training procedures and parameters, it does not explicitly provide information regarding specific training/validation/test splits (e.g., percentages, sample counts, or references to predefined splits) for these datasets. |
| Hardware Specification | No | The paper mentions general hardware terms such as 'custom hardware' and 'single GPU', but it does not specify exact models of GPUs, CPUs, or other detailed processor/memory specifications used for running the experiments. For example, in Section 4.1.4, it states: 'We noticed that training Res Net-50 on a single GPU mandated the use of small batches, leading to performance degradation.' |
| Software Dependencies | No | The paper mentions that 'Our code is based on an implementation by Li (2018)' (which refers to a PyTorch implementation for SSD512). However, it does not provide specific version numbers for any software, libraries, or frameworks used in the methodology (e.g., PyTorch version, Python version, CUDA version). |
| Experiment Setup | Yes | For optimization, we used SGD with a learning rate of 10 4, momentum 0.9, and weight decay 4 10 5 for up to 30 epochs (usually, 10 to 15 epochs were sufficient for convergence). Our initial choice of temperature was T = 10, which performed well. We tried to apply exponential scheduling to the temperature (Jang et al., 2017), but it did not have any noticeable effect on the results. |