reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Hierarchical and Stochastic Crystallization Learning: Geometrically Leveraged Nonparametric Regression with Delaunay Triangulation

Authors: Jiaqi Gu, Guosheng Yin

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We study the asymptotic properties of our method and conduct numerical experiments on both synthetic and real data to demonstrate the advantages of our method over the existing ones.
Researcher Affiliation	Academia	Jiaqi Gu EMAIL Department of Mathematics and Statistics University of South Florida Tampa, FL 33620, USA Guosheng Yin EMAIL Department of Statistics and Actuarial Science School of Computing and Data Science University of Hong Kong Hong Kong SAR, China
Pseudocode	Yes	Algorithm 1 DELAUNAYSPARSE (Chang et al., 2020) ... Algorithm 2 Crystallization search ... Algorithm 3 Stochastic crystallization search
Open Source Code	No	The paper discusses various algorithms (Algorithm 1, 2, 3) and compares methods but does not provide any explicit statement about releasing its own implementation code or a link to a code repository. The license link provided is for the paper itself, not the code.
Open Datasets	Yes	We conduct numerical experiments on both synthetic and real data... apply the deterministic crystallization learning to several real data sets from the UCI repository. The critical assessment of protein structure prediction (CASP) data set (Betancourt and Skolnick, 2001)... The Concrete data set (Yeh, 1998)... Parkinson s telemonitoring data set... We also apply the hierarchical crystallization learning to the Year Prediction MSD data set from the UCI repository.
Dataset Splits	Yes	For each data set, we simulate 100 training data sets {(xi, yi) : i = 1, . . . , n} under different values of sample size n and dimension d. For each training data set, we evaluate the prediction performance of our method on 100 randomly generated target points z1, . . . , z100. ... For each data set, we take 100 bootstrap samples without replacement of size n (n = 200, 500, 1000 or 2000) for training and 100 bootstrap samples of size 100 for testing.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU models, or cloud computing instance types used for running the experiments. It only refers to general 'computation time/power' in a theoretical context.
Software Dependencies	No	The paper mentions several methods and algorithms such as k-NN, local linear regression, kernel regression, Gaussian process models, and GAM (Generalized Additive Models, Hastie and Tibshirani (1990)), but it does not specify the version numbers of any software libraries, programming languages, or tools used for their implementation.
Experiment Setup	Yes	We implement the crystallization learning with L = 3 for d = 5, 10 and L = 2 for d = 20, 50... For k = 1, . . . , 100, we implement the stochastic crystallization learning to estimate µ(zk) with B = 100 randomly generated sets of simplicies under the energy distribution (5) with the maximal energy loss Λ = 0, 0.1, . . . , 3.0. ... We implement the two-layer hierarchical crystallization learning with n = 2C representative points and L = 2...