Weight matrices compression based on PDB model in deep neural networks
Authors: Xiaoling Wu, Junpeng Zhu, Zeng Li
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our PDB model fits the empirical distribution of eigenvalues of the weight matrix better than the PUB model, and our compressed weight matrices have lower rank at the same level of test accuracy. In some cases, our compression method can even improve generalization performance when labels contain noise. ... In this section, we conduct numerical experiments to demonstrate the superiority of our PDB model and the effectiveness of the weight matrix compression algorithm. |
| Researcher Affiliation | Academia | Department of Statistics and Data Science, Southern University of Science and Technology, Shenzhen, China. Correspondence to: Zeng Li <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 PDBLS algorithm (For estimation) ... Algorithm 2 PDB Noise-Filtering algorithm (For matrix compression) |
| Open Source Code | Yes | The code is avaliable at https://github.com/xlwu571/PDBLS. |
| Open Datasets | Yes | The FCNN is trained on MNIST, while Res Net18 and VGG16 are evaluated on CIFAR10 (Krizhevsky et al., 2009) and Image Net (Deng et al., 2009). ... For the language models, experiments are conducted on the RTE (Wang et al., 2018) and Sci Tail (Khot et al., 2018) datasets, while the vision model is tested on DTD (Cimpoi et al., 2014) and SUN397 (Xiao et al., 2016). |
| Dataset Splits | Yes | We evaluate generalization performance using test accuracy and employ three basic neural network architectures, the three-layer Fully Connected Neural Network (FCNN), Residual Network-18 (Res Net18) (He et al., 2015), and Visual Geometry Group-16 (VGG16) (Simonyan & Zisserman, 2015). The FCNN is trained on MNIST, while Res Net18 and VGG16 are evaluated on CIFAR10 (Krizhevsky et al., 2009) and Image Net (Deng et al., 2009). ... Additionally, we assess the generalization of three representative pre-trained architectures: BERT (Devlin et al., 2019) and T5-base (Raffel et al., 2020) for natural language processing, and Vi T-L (Dosovitskiy, 2020) for computer vision. ... For the language models, experiments are conducted on the RTE (Wang et al., 2018) and Sci Tail (Khot et al., 2018) datasets, while the vision model is tested on DTD (Cimpoi et al., 2014) and SUN397 (Xiao et al., 2016). |
| Hardware Specification | Yes | All codes in the experiment are conducted on the server equipped with NVIDIA L40 GPUs and Ubuntu 22.04. |
| Software Dependencies | No | The text mentions optimizers (SGD, Adam W) and activation functions (Relu, Softmax) but does not provide specific version numbers for any libraries or frameworks like PyTorch, TensorFlow, or scikit-learn. It only mentions the operating system version 'Ubuntu 22.04'. |
| Experiment Setup | Yes | Each image is normalized to the range of [0,1], and the weight matrices of the networks are initialized by the Glorot uniform distribution (Glorot & Bengio, 2010). For basic architectures, we employ SGD with an exponential decay learning rate during the training phase. The activation functions used are Relu( ) for hidden layers and Softmax( ) for output layer. For large-scale model, we utilize the Adam W (Loshchilov & Hutter, 2017) optimizer in conjunction with a cosine learning rate scheduler. |