reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Graph-Dependent Implicit Regularisation for Distributed Stochastic Subgradient Descent

Authors: Dominic Richards, Patrick Rebeschini

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present numerical experiments to show that the qualitative nature of the upper bounds we derive can be representative of real behaviours.
Researcher Affiliation	Academia	Dominic Richards EMAIL Department of Statistics University of Oxford 24-29 St Giles , Oxford, OX1 3LB Patrick Rebeschini EMAIL Department of Statistics University of Oxford 24-29 St Giles , Oxford, OX1 3LB
Pseudocode	No	The paper describes the Distributed SGD algorithm using a mathematical formula (1) but does not present it as a clearly labeled 'Pseudocode' or 'Algorithm' block with structured steps.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or provide links to a code repository.
Open Datasets	No	The paper mentions a simulated dataset: "a simulated data set with a total of N = mn observations {Zi}i [N] are sampled following the experiment within Duchi et al. (2012)." This indicates a simulation procedure based on prior work, not a publicly available dataset provided by the authors or a standard benchmark with specific access information.
Dataset Splits	No	The paper states, "The data set is then randomly spread across the graph with each node getting m samples." This describes how data is distributed among nodes but does not specify training, validation, or test splits needed for reproduction.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running its experiments. It only mentions using a Python library for calculations.
Software Dependencies	No	The paper mentions using "the lbfgs solver within the Logistic Regression function of the python library scikit (Pedregosa et al., 2011)" but does not provide specific version numbers for scikit or Python.
Experiment Setup	Yes	Dimension and Monte Carlo estimate size are set to d = 100 and b N = 1000, respectively. We investigate the performance of Distributed SGD in two sample size regimes m = 2 and m = 100. Distributed SGD is run for 15 different time horizons t, between 102 and either 107 or 106.5 for graph sizes n = 32 or n = 102, respectively. All runs are initialised from X1 v = 0 for all v V. Comparisons are made for three choices of the step size, as prescribed in Corollary 6, and for three choices of the graph topology: complete graph (α = 0), grid (α = 1/2), and cycle (α = 1). Speciﬁcally, the two ﬁxed step size choices considered are: ρ Const = O(1/ nm), to align with serial single-machine SGD; and ρ Const Net = O((1 σ2(P))/ nm).