reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Scalable Approximations for Generalized Linear Problems

Authors: Murat Erdogdu, Mohsen Bayati, Lee H. Dicker

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we demonstrate the performance of our algorithm on well-known classiﬁcation and regression problems, through extensive numerical studies on large-scale datasets, and show that it achieves the highest performance compared to several other widely used optimization algorithms.
Researcher Affiliation	Academia	Murat A. Erdogdu EMAIL Department of Computer Science Department of Statistical Sciences University of Toronto Toronto, ON M5S 3G3, Canada Mohsen Bayati EMAIL Graduate School of Business Stanford University Stanford, CA 94305, USA Lee H. Dicker EMAIL Department of Statistics and Biostatistics Rutgers University Piscataway, NJ 08854, USA
Pseudocode	Yes	Algorithm 1 SLS: Scaled Least Squares Estimator Algorithm 2 Conversion from one GLM to another
Open Source Code	No	The paper does not provide a direct link to a source-code repository or an explicit statement about the release of code for the methodology described.
Open Datasets	Yes	The datasets we analyzed were: (i) a synthetic dataset generated from a logistic regression model with iid {exponential(1) 1} predictors scaled by Σ(1); (ii) the Higgs dataset (logistic regression) Baldi et al. (2014); (iii) a synthetic dataset generated from a Poisson regression model with iid binary( 1) predictors scaled by Σ(2); (iv) the Covertype dataset (Poisson regression) Blackard and Dean (1999).
Dataset Splits	Yes	The test error is measured as the mean squared error of the estimated mean using the current parameters at each iteration on a test dataset, which is a randomly selected (and set-aside) 10% portion of the entire dataset.
Hardware Specification	No	The paper does not specify any particular hardware used for the experiments, such as GPU/CPU models or cloud computing resources.
Software Dependencies	No	The paper mentions 'R s built-in function glm' for finding the MLE and various optimization algorithms like Newton-Raphson, BFGS, LBFGS, Gradient descent, and Accelerated gradient descent. However, it does not specify version numbers for these software components or any other libraries used.
Experiment Setup	Yes	For all the algorithms for computing the MLE, the step size at each iteration is chosen via the backtracking line search (Boyd and Vandenberghe, 2004).