reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Newton-Stein Method: An Optimization Method for GLMs via Stein's Lemma

Authors: Murat A. Erdogdu

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we empirically demonstrate that our algorithm achieves the highest performance compared to various optimization algorithms on several data sets. In this section, we validate the performance of Newton-Stein method through extensive numerical studies. We experimented on two commonly used GLM optimization problems, namely, Logistic Regression (LR) and Linear Regression (OLS).
Researcher Affiliation	Academia	Murat A. Erdogdu EMAIL Department of Statistics Stanford University Stanford, CA 94305-4065, USA
Pseudocode	Yes	Algorithm 1 Newton-Stein Method Input: ˆβ0, \|S\|, ϵ, {γt}t 0. 1. Estimate the covariance using a random sub-sample S [n]: bΣS = 1 \|S\| P i S xix T i . 2. while ˆβt+1 ˆβt 2 > ϵ do ˆµ2(ˆβt) = 1 n Pn i=1 φ(2)( xi, ˆβt ), ˆµ4(ˆβt) = 1 n Pn i=1 φ(4)( xi, ˆβt ), " bΣ 1 S ˆβt[ˆβt]T ˆµ2(ˆβt)/ˆµ4(ˆβt) + bΣS ˆβt, ˆβt ˆβt+1 = ˆβt γt Qt βℓ(ˆβt), 3. end while Output: ˆβt.
Open Source Code	No	The paper does not contain any explicit statements about the release of source code, nor does it provide any links to a code repository or mention code in supplementary materials for the methodology described.
Open Datasets	Yes	We experimented on two real data sets where the data sets are downloaded from UCI repository (Lichman, 2013). Both data sets satisfy n p, but we highlight the diﬀerence between the proportions of dimensions n/p. See Table 2 for details.
Dataset Splits	No	The paper uses synthetic and real datasets for experiments but does not explicitly provide details on how these datasets were split into training, validation, or test sets, nor does it specify any cross-validation methodology.
Hardware Specification	No	The paper reports computation time in seconds for experiments, but it does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run these experiments.
Software Dependencies	No	The paper discusses various optimization algorithms but does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks, or solvers) used for implementing or running the experiments.
Experiment Setup	No	The paper states, "For all the algorithms, we use a constant step size that provides the fastest convergence," and mentions that parameters like sub-sample size \|S\|, and rank r are selected "by following the guidelines described in Section 4.4." However, it does not explicitly provide the specific numerical values of these hyperparameters (e.g., constant step size value, specific \|S\|, or r values) used for each experiment.