ReDistill: Residual Encoded Distillation for Peak Memory Reduction of CNNs
Authors: Fang Chen, Gourav Datta, Mujahid Al Rafi, Hyeran Jeon, Meng Tang
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate our method s superior performance compared to other feature-based and response-based distillation methods when applied to the same student network. The code is available at https://github.com/mengtanglab/Re Distill. |
| Researcher Affiliation | Academia | Fang Chen1 EMAIL Gourav Datta2 EMAIL Mujahid Al Rafi1 EMAIL Hyeran Jeon1 EMAIL Meng Tang1 EMAIL 1University of California Merced 2Case Western Reserve University |
| Pseudocode | No | The paper includes diagrams (Figure 2, Figure 3) and textual descriptions of the method, but no explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/mengtanglab/Re Distill. |
| Open Datasets | Yes | 1) STL10 (Coates et al. (2011)) contains 5K training images with 10 classes and 8K testing images of resolution 96 96 pixels. 2) Image Net (Russakovsky et al. (2015)) is a widely-used dataset of classification, which provides 1.2 million images for training and 50K images for validation over 1,000 classes. 1) CIFAR-10 (Krizhevsky et al. (2009)) comprises 60,000 color images of 32x32 resolution across 10 classes, with each class containing 6,000 images. 2) Celeb-A (Liu et al. (2015)) is a large-scale face attributes dataset containing over 200,000 celebrity images, each annotated with 40 attributes. |
| Dataset Splits | Yes | 1) STL10 (Coates et al. (2011)) contains 5K training images with 10 classes and 8K testing images of resolution 96 96 pixels. 2) Image Net (Russakovsky et al. (2015)) is a widely-used dataset of classification, which provides 1.2 million images for training and 50K images for validation over 1,000 classes. 1) CIFAR-10 (Krizhevsky et al. (2009)) comprises 60,000 color images of 32x32 resolution across 10 classes, with each class containing 6,000 images. The dataset is divided into 50,000 training images and 10,000 test images. |
| Hardware Specification | Yes | All experiments are implemented in Pytorch and evaluated on 4 NVIDIA A100 GPUs. All experiments are implemented in Pytorch and evaluated on an NVIDIA 4090 GPU. In addition to the theoretical peak memory, we also measure the actual peak memory consumed on an NVIDIA Jetson TX2 device. |
| Software Dependencies | No | The paper mentions 'Pytorch' as the implementation framework and 'Torch AO' toolkit, but no specific version numbers are provided for these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | On STL10 dataset... trained from scratch... for 300 epochs. The batch size is set to 8 and the dropout rate is set to 0.2. The SGD with momentum equal to 0.9 is used as the optimizer. The initial learning rate is set to 0.01, which is reduced by a factor 0.2 at the 180th, 240th and 270th epoch, respectively. The α in Equation 9 is set to 50. On Image Net dataset... keep training for 300 epochs and decay the learning rate at the 180th, 240th and 270th epoch with factor 0.1... The α in Equation 9 is set to be 1. For the teacher model, we keep the same experiment settings as DDPM (Ho et al. (2020)) with applying T = 1000, β1 = 10 4, βT = 0.02... For CIFAR-10 dataset, we train all models for 1000K iterations... For Celeb-A dataset, we train all models with 250K iterations... |