Learning by Unsupervised Nonlinear Diffusion
Authors: Mauro Maggioni, James M. Murphy
JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement LUND and confirm its theoretical properties on illustrative data sets, demonstrating its theoretical and empirical advantages over both spectral and density-based clustering. |
| Researcher Affiliation | Academia | Mauro Maggioni Department of Mathematics, Department of Applied Mathematics and Statistics, Mathematical Institute of Data Sciences, Institute of Data Intensive Engineering and Science, Johns Hopkins University, Baltimore, MD 21218, USA. James M. Murphy Department of Mathematics Tufts University, Medford, MA 02155, USA |
| Pseudocode | Yes | Algorithm 1 Learning by Unsupervised Nonlinear Diffusion (LUND) Algorithm Input: X (data), σ0 (kernel density bandwidth), σ (diffusion scaling parameter), t (time parameter), τ (threshold) Output: Y (cluster assignments), ˆK (estimated number of clusters) |
| Open Source Code | No | The paper does not explicitly state that source code for the described methodology is provided, nor does it include a link to a code repository. |
| Open Datasets | No | The paper uses "synthetic data" and describes how these are generated (e.g., "samples are drawn from µG"), rather than referencing specific, publicly available datasets with concrete access information. |
| Dataset Splits | No | The paper states: "All experiments are conducted on randomly generated data, with results averaged over 100 trials." This indicates data generation and repeated trials, but no explicit training/test/validation splits on a fixed dataset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper mentions techniques like 'cover trees' and 'Gaussian kernel' but does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions). |
| Experiment Setup | Yes | The diffusion distances are computed by truncating (2.2) to sum only over the largest (in terms of modulus of the eigenvalues) M = 100 n eigenpairs, and the KDE p(x) uses 100 nearest neighbors with σ0 = 1. ... P is constructed with σ = .15. ... LUND used parameters (σ, t) = (.15, 106) for these data. |