reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

HiGrad: Uncertainty Quantification for Online Learning and Stochastic Approximation

Authors: Weijie J. Su, Yuancheng Zhu

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, the performance of Hi Grad is evaluated through extensive simulation studies and a real data example. An R package higrad has been developed to implement the method.
Researcher Affiliation	Collaboration	Weijie J. Su EMAIL University of Pennsylvania, USA Yuancheng Zhu EMAIL Renaissance Technologies LLC, USA
Pseudocode	Yes	Algorithm 1 The Hi Grad Algorithm
Open Source Code	Yes	An R package higrad has been developed to implement the method. Keywords: Hi Grad, stochastic gradient descent, online learning, stochastic approximation, Ruppert Polyak averaging, uncertainty quantiﬁcation, t-conﬁdence interval... To facilitate the use of Hi Grad in practice, we set a default conﬁguration of this method in our R package higrad (https://cran.r-project.org/web/packages/higrad/) through balancing between contrasting and sharing, showing its satisfactory performance in a variety of scenarios in Section 5.
Open Datasets	Yes	To illustrate this, we apply SGD to the Adult dataset hosted on the UCI Machine Learning Repository (Lichman, 2013) as an example... We use the preprocessed version hosted on the Lib SVM repository (Chang and Lin, 2011), which has 123 binary features and contains 32,561 samples.
Dataset Splits	Yes	The original dataset contains 14 features, of which 6 are continuous and 8 are categorical. We use the preprocessed version hosted on the Lib SVM repository (Chang and Lin, 2011), which has 123 binary features and contains 32,561 samples. We randomly pick 1,000 as a test set, and the rest as a training set.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running its experiments. It describes the experimental setup in terms of algorithms, datasets, step sizes, and number of iterations, but not the underlying hardware.
Software Dependencies	No	The paper mentions that an "R package higrad has been developed to implement the method" and provides a link to its CRAN repository. However, it does not specify the version of the R language itself, or any other library dependencies with version numbers.
Experiment Setup	Yes	The step size γj is set to 0.1j 0.55 and 0.4j 0.55 for linear regression and logistic regression, respectively, and θ0 is initialized randomly with a N(0, 0.01I) distribution. Three types of the true coeﬃcients θ are examined: a null case where θ 1 = = θ d = 0, a dense case where θ 1 = = θ d = 1 and a sparse case where θ 1 = = θ d/10 = p 10/d, θ d/10+1 = = θ d = 0. Table 1 presents the Hi Grad conﬁgurations considered in the simulation studies. Note that all of the four Hi Grad conﬁgurations have T = 4 threads. ... The step size is taken to be γj = 0.5j 0.505 and the initial points are chosen as earlier in Section 5.1.