reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

AdaGK-SGD: Adaptive Global Knowledge Guided Distributed Stochastic Gradient Descent

Authors: Hangyu Ye, Weiying Xie, Yunsong Li, Leyuan Fang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerically, we find that Ada GK-SGD can significantly improve the accuracy and generalizability of distributed algorithms compared with existing methods. ... Experiments In this section, we evaluate Ada GK-SGD and the improved version with MLGK module of Slow Mo (Wang et al. 2019), EASGD (Zhang, Choromanska, and Le Cun 2015), and BMUF-Adam (Chen, Ding, and Huo 2020) on a variety of different image classification models and datasets. ... Performance on CIFAR-10/100. ... Performance on ILSVRC2012. ... Parametric Analysis
Researcher Affiliation	Academia	1 State Key Laboratory of Integrated Services Networks, Xidian University, Xi an 710071, China 2 College of Electrical and Information Engineering, Hunan University, Changsha 410082, China EMAIL, EMAIL, EMAIL, leyuan EMAIL
Pseudocode	Yes	Algorithm 1: Ada GK-SGD Require: network scale N, global averaging period τ, total number of iterations T, learning rate η, auxiliary variable parameter αinitial parameter winit. Initialize: w(0) winit, w(0) Global winit, Z(0) 0, G(0) 0, M τ. 1: for k = 1, 2, ..., T every worker i do 2: Sample ξ(k+1) i , update g(k) i = Li ξi (k+1) , w(k) i . 3: µ(k) = max{s : s k and s mod τ = 0}. 4: if k = µ(k) then 5: w(µ(k)) Global = 1 N PN i=1 w(µ(k)) i . 6: Compute Z(µ(k)) i base on Equation 13 or other methods. 7: end if 8: w (k+ 1 2 ) i = w(k) i η g(k) i 9: Determine ψ based on Equation 14. 10: G(k) i = ψα (Z (µ(k)) i w (k+ 1 11: Compute M of global knowledge based on Equations 19 or 22. 12: w(k+1) i = w (k+ 1 2 ) i + G(k) i 13: end for
Open Source Code	Yes	Code https://github.com/Yehangyu-XD/Ada GK-SGD
Open Datasets	Yes	The datasets we use for our experiments are CIFAR10/100 and ILSVRC2012.
Dataset Splits	No	The datasets we use for our experiments are CIFAR10/100 and ILSVRC2012. ... The TOP-1 test accuracy of Ada GK-SGD and improved algorithms compared with that of the baseline and the original version of Slow Mo, EASGD, and BMUF-Adam. ... When it is not necessary to specify the parameters, the epoch is set to 100, and the local Batch Size is set to 256. The paper mentions "test accuracy" but does not specify how the datasets were split into training, validation, and test sets, nor does it explicitly state that standard splits were used with reference.
Hardware Specification	Yes	All experiments on the dataset CIFAR are performed on 4 NVIDIA GTX 3090 GPUs. All experiments on the dataset ILSVRC2012 are performed on 4 NVIDIA A100-SXM.
Software Dependencies	No	To ensure the reliability and validity of the experiments, the models in training are implemented with Pytorch. The paper mentions PyTorch but does not provide specific version numbers for it or any other key software dependencies.
Experiment Setup	Yes	When it is not necessary to specify the parameters, the epoch is set to 100, and the local Batch Size is set to 256. All experiments use the warm-up (Goyal et al. 2017) algorithm to improve convergence, specifically the learning rate is linearly increased to 0.01 in the earliest 5 epochs and then decreases to 10^-6 according to the cosine.