reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning by Unsupervised Nonlinear Diffusion

Authors: Mauro Maggioni, James M. Murphy

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implement LUND and conﬁrm its theoretical properties on illustrative data sets, demonstrating its theoretical and empirical advantages over both spectral and density-based clustering.
Researcher Affiliation	Academia	Mauro Maggioni Department of Mathematics, Department of Applied Mathematics and Statistics, Mathematical Institute of Data Sciences, Institute of Data Intensive Engineering and Science, Johns Hopkins University, Baltimore, MD 21218, USA. James M. Murphy Department of Mathematics Tufts University, Medford, MA 02155, USA
Pseudocode	Yes	Algorithm 1 Learning by Unsupervised Nonlinear Diﬀusion (LUND) Algorithm Input: X (data), σ0 (kernel density bandwidth), σ (diﬀusion scaling parameter), t (time parameter), τ (threshold) Output: Y (cluster assignments), ˆK (estimated number of clusters)
Open Source Code	No	The paper does not explicitly state that source code for the described methodology is provided, nor does it include a link to a code repository.
Open Datasets	No	The paper uses "synthetic data" and describes how these are generated (e.g., "samples are drawn from µG"), rather than referencing specific, publicly available datasets with concrete access information.
Dataset Splits	No	The paper states: "All experiments are conducted on randomly generated data, with results averaged over 100 trials." This indicates data generation and repeated trials, but no explicit training/test/validation splits on a fixed dataset.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments.
Software Dependencies	No	The paper mentions techniques like 'cover trees' and 'Gaussian kernel' but does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions).
Experiment Setup	Yes	The diﬀusion distances are computed by truncating (2.2) to sum only over the largest (in terms of modulus of the eigenvalues) M = 100 n eigenpairs, and the KDE p(x) uses 100 nearest neighbors with σ0 = 1. ... P is constructed with σ = .15. ... LUND used parameters (σ, t) = (.15, 106) for these data.