reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Algorithms for ridge estimation with convergence guarantees

Authors: Wanli Qiao, Wolfgang Polonik

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose two novel algorithms, and provide theoretical guarantees for their convergences, by which we mean that the algorithms can asymptotically recover the full ridge set. We consider the new algorithms as alternatives to the Subspace Constrained Mean Shift (SCMS) algorithm for which no such theoretical guarantees are known. ... The performance is illustrated using some numerical studies in R2 ... Simulation study ... Real Data Application
Researcher Affiliation	Academia	Wanli Qiao EMAIL Department of Statistics George Mason University ... Wolfgang Polonik EMAIL Department of Statistics University of California
Pseudocode	Yes	Basic Algorithm 1: Alternative SCMS approach using an estimated ridgeness function. Input: y0 i = Xi, i = 1, , n, a > 0, h > 0. Update: For i = 0, 1, 2, . . . , n, for j = 1, 2, . . . , while yj i [0, 1]d : yj+1 i = yj i a Πbη(yj i ) [ ξ bf(yj i )] Π bf(yj i ) bf(yj i ), (2.9) else: discard the entire sequence y0 i , y1 i , . . . Output: {y i : bη(y i ) = 0, λ bf k+1(y i ) < 0}.
Open Source Code	No	No explicit statement or link for open-source code release for the methodology described in this paper is provided.
Open Datasets	Yes	We apply our algorithms to a data set of active and extinct volcanoes in Japan available at https://en.wikipedia.org/wiki/List_of_volcanoes_in_Japan.
Dataset Splits	No	No specific training/test/validation dataset splits are explicitly provided. The simulation study mentions '200 random samples' and '200 replicates of size 10000'. The real data application uses 'all the sample points as starting points'.
Hardware Specification	No	No specific hardware details (e.g., exact GPU/CPU models, processor types) are provided for running the experiments. The paper mentions 'resources provided by the Office of Research Computing at George Mason University' but without further specifications.
Software Dependencies	No	No specific ancillary software details with version numbers are provided in the paper.
Experiment Setup	Yes	The bandwidth h used in the kernel estimates was set to be 0.2 for sample size n = 500 and τ = 0.001 for Algorithm 2. ... Algorithm 2 with step length a = 0.005 ... For all the three algorithms, we used the same 200 replicates of size 10000, the same bandwidth 0.3, and the same 50 50 grid points as the starting points. We set τ = 0.005 for Algorithm 2.