reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distributed Stochastic Gradient Descent: Nonconvexity, Nonsmoothness, and Convergence to Local Minima

Authors: Brian Swenson, Ryan Murray, H. Vincent Poor, Soummya Kar

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	It is shown that, for each ﬁxed initialization, D-SGD converges to critical points of the loss with probability one. Next, we consider the problem of avoiding saddle points... Results are proved by studying the underlying (distributed) gradient ﬂow, using the ordinary diﬀerential equation (ODE) method of stochastic approximation. The remainder of the paper is organized as follows. Section 2 presents the main results, reviews related literature, and introduces notation to be used in the proofs. Sections 3 8 prove the main results (see Section 2.8 for an overview of these sections and the general proof strategy). Section 9 concludes the paper.
Researcher Affiliation	Academia	Brian Swenson EMAIL Applied Research Laboratory Pennsylvania State University State College, PA 16801 Ryan Murray EMAIL Department of Mathematics North Carolina State University Raleigh, NC 27695 H. Vincent Poor EMAIL Department of Electrical and Computer Engineering Princeton University Princeton, NJ 08544 Soummya Kar EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA 15213
Pseudocode	No	The D-SGD algorithm is deﬁned agentwise by the recursion xn(k + 1) = xn(k) αk fn(xn(k)) + ξn(k + 1) + βk X xℓ(k) xn(k) , (2) for n = 1, . . . , N,
Open Source Code	No	The paper does not provide any statement or link regarding the availability of source code for the methodology described.
Open Datasets	No	Concretely, suppose that Dn = {(xi, yi)}i represents a local data set collected or stored by agent n. Let ℓ( , ) denote some predeﬁned loss function, and let h( , θ) denote a parametric hypothesis class, with parameter θ. In empirical risk minimization, the objective is to minimize the empirical risk over the data held by all agents, i.e., solve the optimization problem (x,y) S n Dn ℓ(h(x, θ), y) = min θ (x,y) Dn ℓ(h(x, θ), y), where the objective above ﬁts the form of (1) with fn(θ) = P (x,y) Dn ℓ(h(xi, θ), yi).
Dataset Splits	No	The paper does not describe any experimental evaluation using specific datasets, and therefore, no information on dataset splits is provided.
Hardware Specification	No	The paper is theoretical in nature, focusing on mathematical proofs and convergence analysis of D-SGD. It does not describe any experiments or computations that would require specific hardware, hence no hardware specifications are provided.
Software Dependencies	No	The paper is purely theoretical, providing mathematical analysis and proofs. It does not describe any software implementations or experiments, thus no software dependencies with version numbers are mentioned.
Experiment Setup	No	The paper focuses on theoretical analysis, theorems, and proofs for distributed stochastic gradient descent. It does not present any experimental results or describe an experimental setup with hyperparameters or training configurations.