Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Dynamic Sparse Training of Diagonally Sparse Networks

Authors: Abhishek Tyagi, Arjun Iyer, William H Renninger, Christopher Kanan, Yuhao Zhu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations on diverse neural architectures demonstrate that our method maintains accuracy on par with unstructured counterparts while benefiting from tangible computational gains.
Researcher Affiliation Academia 1Department of Computer Science, University of Rochester, Rochester, NY, USA 2The Institute of Optics, University of Rochester, Rochester, NY, USA. Correspondence to: Abhishek Tyagi <EMAIL>.
Pseudocode No For a comprehensive understanding of the implementation details, including pseudocode and additional optimizations, we direct readers to (Okanovic et al., 2024).
Open Source Code Yes Our source code is available at https://github.com/ horizon-research/Dyna Diag/.
Open Datasets Yes 1. CIFAR-10 (Krizhevsky & Hinton, 2009) consists of 60,000 colored images of resolution 32 32, divided into 10 classes (e.g., airplanes, cars, birds). The dataset is split into 50,000 training and 10,000 test images. 2. CIFAR-100 (Krizhevsky & Hinton, 2009) also contains 32 32 resolution images but spans 100 classes. Each class includes 500 training and 100 test images, totaling 60,000 images. 3. Image Net-1K (Deng et al., 2009) covers 1,000 object classes, with 1.28M training, 50,000 validation, and 100,000 test images. Images are typically resized and cropped to 224 224 for processing. 4. Wiki Text-103 (Merity et al., 2016) comprises over 100 million tokens extracted from verified Wikipedia articles.
Dataset Splits Yes The dataset is split into 50,000 training and 10,000 test images. (for CIFAR-10) Each class includes 500 training and 100 test images (for CIFAR-100) 1.28M training, 50,000 validation, and 100,000 test images (for Image Net-1K).
Hardware Specification Yes All experiments are conducted on the NVIDIA Tesla A100 GPUs with the following configuration: Model: NVIDIA A100 80GB Memory: 80GB HBM2e Memory Bandwidth: 2.0 TB/s (higher than the 40GB version) TDP : 400W (PCIe: 300W) Peak FP32 Performance: 19.5 TFLOPS (same as 40GB) Peak FP16 Performance: 312 TFLOPS (same as 40GB)
Software Dependencies No Rig L: We use CUSPARSE... DSB and Pixelated BFly: Both methods yield block-sparse weight matrices. We use the Triton-based library from Pixelated BFly (Dao et al., 2021)... Our Py Torch implementation does not exploit CUDA kernel optimizations...
Experiment Setup Yes Table 3: Configuration of the CIFAR10 and CIFAR100 experiments with MLPMixer. Table 4: Configuration of the CIFAR10 and CIFAR100 experiments with Vi T-Small. Table 5: Configuration of the Image Net experiments with Vi T-Base and MLPMixer. Table 6: Configuration of the Image Net experiments with Vi T-Large and Huge. Table 7: Configuration of the Wikitext-103 experiments GPT-2Small experiments.