Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries

Authors: Chris Kolb, Tobias Weber, Bernd Bischl, David RĂ¼gamer

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our deep weight factorization through experiments on various architectures and datasets, consistently outperforming its shallow counterpart and widely used pruning methods.
Researcher Affiliation Academia Chris Kolb, Tobias Weber, Bernd Bischl, & David R ugamer Department of Statistics, LMU Munich, Munich Munich Center for Machine Learning (MCML), Munich EMAIL
Pseudocode Yes D Algorithms In the following, we provide the algorithms for the proposed initialization (Section 4.1) of DWF networks in Appendix D.1 and how to train these networks in Appendix D.2. D.1 DWF INITIALIZATION Algorithm 1 DWF Initialization with Variance-Matching and Absolute Value Truncation [...] D.2 DWF TRAINING Algorithm 2 Training Factorized Neural Networks
Open Source Code No The paper does not contain any explicit statements or links indicating the availability of source code for the methodology described.
Open Datasets Yes Our experiments cover commonly used computer vision benchmarks: Le Net-300-100 and Le Net-5 (Le Cun et al., 1998) on MNIST, Fashion-MNIST, and Kuzushiji MNIST, VGG-16 and VGG-19 (Simonyan & Zisserman, 2014) on CIFAR10 and CIFAR100, and Res Net-18 (He et al., 2016) on CIFAR10 and Tiny Image Net.
Dataset Splits Yes All datasets are split into training (50,000 or 60,000 samples) and test (10,000 samples) sets.
Hardware Specification Yes Large experiments on Res Net-18 and VGG-19 on datasets CIFAR10, CIFAR100, and Tiny Image Net were run on an A-100 GPU server with 32GB RAM and 16 CPU cores. Smaller experiments were conducted on a single A-4000 GPU with 48GB RAM or CPU workstations.
Software Dependencies No The paper mentions training with SGD and cosine learning rate annealing, and implicitly uses PyTorch based on common deep learning practices, but does not provide specific version numbers for any software components or libraries.
Experiment Setup Yes Training hyperparameters In our experiments, we use training hyperparameter configurations following broadly established standard settings (Simonyan & Zisserman, 2014; He et al., 2015; Zagoruyko & Komodakis, 2016), as displayed in Table 4. For both Le Net-300-100 and Le Net-5, we set the initial LR to 0.15 and found it to perform well across datasets... All models are trained with SGD and cosine learning rate annealing (Loshchilov & Hutter, 2022).