reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Asymptotic Analysis of Conditioned Stochastic Gradient Descent

Authors: Rémi Leluc, François Portier

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For the sake of completeness and illustrative purposes, we compare the performance of classical stochastic gradient descent (sgd) and the conditionned variant (csgd) presented in Appendix B where the matrix Φk is an averaging of past Hessian estimates as given in Equation (22). We shall compare equal weights ωj,k = (k + 1) 1 and adaptive weights ωj,k exp( η θj θk 1) with η > 0 to give more importance to Hessian estimates associated to iterates which are closed to the current point. Furthermore, for computational reason, we consider a novel adaptive stochastic ﬁrst-order method which is a variant of Adagrad. Starting from the null vector θ0 = (0, . . . , 0) Rd, we use optimal learning rate of the form γk = α/(k + k0) (Bottou et al., 2018) and set λ(m) k 0, λ(M) k = Λk in the experiments where γ, k0 and Λ are tuned using a grid search. The means of the optimality ratio k 7 [F(θk) F(θ )]/[F(θ0) F(θ )], obtained over 100 independent runs, are presented in Figures below.
Researcher Affiliation	Academia	Rémi Leluc EMAIL CMAP, École Polytechnique Institut Polytechnique de Paris, Palaiseau (France) François Portier EMAIL CREST, ENSAI École Nationale de la Statistique et de l Analyse de l Information, Rennes (France)
Pseudocode	No	The paper describes algorithms using mathematical equations and textual descriptions, such as "θk+1 = θk γk+1Ckg(θk, ξk+1), k 0,". However, it does not include a distinct block explicitly labeled as "Pseudocode" or "Algorithm" with structured steps.
Open Source Code	No	The paper mentions "implemented in widely used programming tools (Pedregosa et al., 2011; Abadi et al., 2016)", referring to third-party software like scikit-learn and TensorFlow. However, it does not provide any explicit statement about releasing its own source code for the methodology described, nor does it include a link to a code repository.
Open Datasets	Yes	Real-world data. We now turn our attention to real-world data and consider again the Ridge regression problem on the following datasets: Boston Housing dataset (Harrison Jr & Rubinfeld, 1978) (n = 506; d = 14) and Diabetes dataset (Dua & Graﬀ, 2017) (n = 442; d = 10).
Dataset Splits	No	The paper mentions using "simulated data" and "real-world data" (Boston Housing, Diabetes datasets) and states, "We use a batch-size equal to \|B\| = 16." This specifies the mini-batch size but does not provide information regarding how the datasets were split into training, validation, or test sets for reproduction.
Hardware Specification	No	The paper does not explicitly describe any specific hardware (e.g., GPU models, CPU types, memory specifications) used for running its experiments.
Software Dependencies	No	The paper mentions "widely used programming tools (Pedregosa et al., 2011; Abadi et al., 2016)" such as scikit-learn and TensorFlow in a general context. However, it does not provide a list of specific software dependencies, libraries, or frameworks with their version numbers that were used for the authors' implementation.
Experiment Setup	Yes	Starting from the null vector θ0 = (0, . . . , 0) Rd, we use optimal learning rate of the form γk = α/(k + k0) (Bottou et al., 2018) and set λ(m) k 0, λ(M) k = Λk in the experiments where γ, k0 and Λ are tuned using a grid search. The means of the optimality ratio k 7 [F(θk) F(θ )]/[F(θ0) F(θ )], obtained over 100 independent runs, are presented in Figures below. We use a batch-size equal to \|B\| = 16.