reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Coefficient Makes SVRG Effective

Authors: Yida Yin, Zhiqiu Xu, Zhiyuan Li, trevor darrell, Zhuang Liu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical analysis finds that, for deeper neural networks, the strength of the variance reduction term in SVRG should be smaller and decrease as training progresses. Inspired by this, we introduce a multiplicative coefficient α to control the strength and adjust it through a linear decay schedule. We name our method αSVRG. Our results show α-SVRG better optimizes models, consistently reducing training loss compared to the baseline and standard SVRG across various model architectures and multiple image classification datasets. We evaluate α-SVRG on a range of architectures and image classification datasets. α-SVRG achieves a lower training loss than the baseline and the standard SVRG. Our results highlight the value of SVRG in deep learning. 5 EXPERIMENTS Table 2 presents the results of training various models on Image Net-1K. Table 3 displays the results of training Conv Ne Xt-F on various smaller datasets.
Researcher Affiliation	Collaboration	1UC Berkeley 2University of Pennsylvania 3TTIC 4Meta AI Research
Pseudocode	Yes	The pseudocode for α-SVRG with SGD and Adam W as base optimizers is provided in Appendix G. Appendix G PSEUDOCODE FOR α-SVRG: Algorithm 1 α-SVRG with SGD, Algorithm 2 α-SVRG with Adam W
Open Source Code	Yes	Code is available at github.com/davidyyd/alpha-SVRG.
Open Datasets	Yes	We evaluate α-SVRG using Image Net-1K classification (Deng et al., 2009) as well as smaller image classification datasets: CIFAR-100 (Krizhevsky, 2009), Pets (Parkhi et al., 2012), Flowers (Nilsback & Zisserman, 2008), STL-10 (Coates et al., 2011), Food-101 (Bossard et al., 2014), DTD (Cimpoi et al., 2014), SVHN (Netzer et al., 2011), and Euro SAT (Helber et al., 2019).
Dataset Splits	Yes	We evaluate α-SVRG using Image Net-1K classification (Deng et al., 2009) as well as smaller image classification datasets: CIFAR-100 (Krizhevsky, 2009), Pets (Parkhi et al., 2012), Flowers (Nilsback & Zisserman, 2008), STL-10 (Coates et al., 2011), Food-101 (Bossard et al., 2014), DTD (Cimpoi et al., 2014), SVHN (Netzer et al., 2011), and Euro SAT (Helber et al., 2019). We report both final epoch training loss and top-1 validation accuracy. These are widely recognized benchmark datasets with established standard splits for training and validation.
Hardware Specification	No	No specific hardware details (e.g., GPU models, CPU types, memory specifications) used for running the experiments are provided in the paper.
Software Dependencies	No	The paper mentions using Adam W and SGD as optimizers and refers to PyTorch image models, but it does not specify software versions (e.g., Python version, PyTorch version, CUDA version) needed for replication.
Experiment Setup	Yes	Our basic training recipe, adapted from Conv Ne Xt (Liu et al., 2022). config value weight init trunc. normal (0.2) optimizer Adam W base learning rate 4e-3 weight decay 0.05 optimizer momentum β1, β2 = 0.9, 0.999 learning rate schedule cosine decay warmup schedule linear randaugment (Cubuk et al., 2020) (9, 0.5) mixup (Zhang et al., 2018) 0.8 cutmix (Yun et al., 2019) 1.0 random erasing (Zhong et al., 2020) 0.25 label smoothing (Szegedy et al., 2016) 0.1. Table 6 lists the batch size, warmup epochs, and training epochs for each dataset. For larger models, we adhere to the original work (Dosovitskiy et al., 2021; Liu et al., 2022), using a stochastic depth rate of 0.4 for Vi T-B and 0.5 for Conv Ne Xt-B. On small datasets, we choose the best α0 from {0.5, 0.75, 1}. For Image Net-1K, we set α0 to 0.75 for smaller models and 0.5 for larger ones.