reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Mask in the Mirror: Implicit Sparsification

Authors: Tom Jacobs, Rebekka Burkholz

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of PILo T in extensive experiments covering three different scenarios. Firstly, we confirm our theoretical results on the gradient flow in Theorem 2.3. Secondly, we compare PILo T with other state-of-the-art continuous sparsification methods such as STR (Kusupati et al., 2020) and spred (Ziyin & Wang, 2023) in a one-shot setting. In this context, we also isolate the individual contribution of our initialization. Finally, we combine PILo T with iterative pruning methods such as WR (Frankle & Carbin, 2019) and LRR (Maene et al., 2021). ... In experiments for diagonal linear networks and vision benchmarks (including Image Net), PILo T consistently outperforms baseline sparsification methods such as STR and spred, which demonstrates the utility of our theoretical insights.
Researcher Affiliation	Academia	Tom Jacobs CISPA Helmholtz Center for Information Security EMAIL Rebekka Burkholz CISPA Helmholtz Center for Information Security EMAIL
Pseudocode	Yes	Algorithm 1 PILo T Require: epochs T, schedule αinit, initialization xinit, scaling constant β Initialize m0, w0 such that m0 w0 = xinit, m2 0 w2 0 = β, δ 1 and, K α0 αinit Current training acc 0 Set f(m, w, α0) := f(m w) + α0 \|\|m\|\|2 L2 + \|\|w\|\|2 L2 for k in 1 . . . T do (mk, wk) = Optimizer Step f(mk 1, wk 1, αk 1) if Training acc Current training acc and \|\|mk wk\|\|L1 K and k T 2 then αk αk 1δ else αk αk 1/δ end if Current training acc Training acc end for return Model f(x T ) with x T = m T w T
Open Source Code	No	The codebase for the experiments is written in Py Torch and torchvision and their relevant primitives for model construction and data-related operations.
Open Datasets	Yes	Firstly, we compare our method PILo T with STR, spred, and LASSO on CIFAR10 and CIFAR100 training a Res Net-20 or Res Net-18, respectively. ... In Table 1, we compare PILo T to both STR and spred on Image Net (Deng et al., 2009).
Dataset Splits	No	The paper uses well-known datasets like CIFAR10, CIFAR100, and Image Net, which typically have standard training, validation, and test splits. However, the paper does not explicitly state the percentages, counts, or specific citations for the splits used in its experiments, nor does it detail a custom splitting methodology.
Hardware Specification	Yes	The experiments in the paper are trained on an NVIDIA A6000. In addition, the diagonal linear network is trained on a CPU 13th Gen INTEL(R) Core(TM) i9-13900H.
Software Dependencies	No	The codebase for the experiments is written in Py Torch and torchvision and their relevant primitives for model construction and data-related operations.
Experiment Setup	Yes	Table 2: One-shot experiment Parameter Setting Comments Optimizer SGD Momentum 0.9 Batch size 256 Activation function Re Lu Weight decay2 10 4 Base learning rate {0.1, 0.2} Epochs 150 Warmup period 0 Initialization Kaiming normal Scaling 1 Only for m w δ 1.01 K 8000 Learning rate schedule cosine warmup ... Table 3: Res Net-50 on Image Net configurations for each sparsity (%). ... Table 5: WR and LRR experiment on Image Net Parameter Setting Comments Optimizer SGD Momentum 0.9 Batch size 512 Activation function Re Lu Weight decay {0, 10 4} Learning rate schedule step warmup Base learning rate {0.1, 0.2} Cycles 25 Pruning rate 0.8 Epochs per cycle 90 Warmup period 10 Initialization Kaiming normal L2 regularization 5 10 5 Only for m w PILOT regularization {0} Only for m w Scaling 1 Only for m w δ 1 Only for m w K Only for m w