reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A stochastic gradient descent algorithm with random search directions

Authors: Eméric Gbaguidi

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The purpose of this section is to show the behavior of the SCORS algorithm on simulated data. For that goal, we consider the logistic regression model (Bach, 2014; Bercu et al., 2020) associated with the classical minimization problem (P) of the convex function f given, for all x Rd, by [...] We carry out the experiments on simulated data in order to illustrate the almost sure convergence (Theorem 1), the central limit theorem (Theorem 2) and the non-asymptotic L2 rate of convergence (Theorem 4). [...] Figure 1 illustrates the almost sure convergence of the algorithms. [...] Figure 2: We used 1000 samples, where each one was obtained by running the associated algorithm for n = 500000 iterations. [...] In Table 1, we report a comparison of the algorithm performances based on the computational cost. [...] Figure 3: Mean squared error with respect to epochs.
Researcher Affiliation	Academia	Eméric Gbaguidi EMAIL Institut de Mathématiques de Bordeaux Université de Bordeaux
Pseudocode	No	The paper describes algorithms (SGD, SCGD, SCORS) using mathematical formulations and prose, but does not include a dedicated or clearly labeled pseudocode or algorithm block with structured steps.
Open Source Code	No	The paper does not contain any explicit statement about the release of source code for the described methodology, nor does it provide any links to a code repository.
Open Datasets	No	In Section 5, the authors state: 'We carry out the experiments on simulated data in order to illustrate the almost sure convergence [...] Therefore, we consider an independent and identically distributed collection {(w1, y1), . . . , (w N, y N)} where the covariate w Nd(0, Id) and the response y {0, 1} is sampled such that P(y = 1\|w) = 1/(1 + e^-<w,x>). The true model parameter x Rd is selected uniformly from the unit sphere. Furthermore, we set the sample size N = 50000 and the parameter dimension d = 50.' The dataset is simulated based on a described procedure rather than being an external, publicly available dataset with a direct link, DOI, or citation.
Dataset Splits	No	The paper describes the generation of simulated data but does not specify any training, validation, or test splits for this data.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to conduct the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or specific solver versions) used in the experiments.
Experiment Setup	Yes	The stepsize γn = 1/n is used where n 1 stands for the iterations. Here, we will compare the four methods (U), (NU), (G) and (S) described in Section 3. Let us define the initial value g1,k given, for any k = 1, . . . , N, by g1,k = fk(X1). Moreover, the sequence (gn,k) is updated, for all n 1 and 1 k N, as ( fk(Xn) if Un+1 = k, gn,k otherwise. For the non-uniform search distribution (NU), the probabilities pn,j are computed for all n 2 and j [[1, d]], as follows \|g(j) 1 \| / sum(\|g(i) 1 \|) if j = j, d - 1 otherwise, where j = argmax \|g(j) 1 \| and g1 = PN k=1 g1,k. [...] we set the sample size N = 50000 and the parameter dimension d = 50. [...] Each epoch consists of running 1000 iterations.