Recovering Manifold Structure Using Ollivier Ricci Curvature
Authors: Tristan L. Saidi, Abigail Hickok, Andrew J Blumberg
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that our method outperforms alternative pruning methods and that it significantly improves performance on many downstream geometric data analysis tasks that use nearest neighbor graphs as input. Specifically, we evaluate on manifold learning, persistent homology, dimension estimation, and others. We also show that ORC-MANL can be used to improve clustering and manifold learning of single-cell RNA sequencing data. Finally, we provide empirical convergence experiments that support our theoretical findings. Our experiments are divided into two sections. Firstly, we evaluate ORC-MANL on a variety of synthetic manifolds for a broad array of benchmark geometric data analysis and machine learning tasks. We also compare pruning accuracy to several benchmark methods presented previously in the literature. |
| Researcher Affiliation | Academia | Tristan Luca Saidi1 , Abigail Hickok2, Andrew J. Blumberg3 1Department of Computer Science, Columbia University 2Department of Statistics & Data Science, Wu Tsai Institute, Yale University 3Departments of Mathematics and Computer Science, Irving Institute for Cancer Dynamics, Columbia University *Correspondence: EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 ORC-MANL Require: G = (V, E) a nearest neighbor graph, λ, δ 1: C {} 2: for (x, y) in E do candidate selection 3: κ(x, y) Ollivier Ricci(G, (x, y)) 4: if κ(x, y) 1 + 4 1 δ then Lemma A.1 5: C C {(x, y)} 6: E E \ C 7: G (V, E ) 8: for (x, y) in C do shortcut detection 9: d G (x, y) Shortest Path(G , x, y) 10: if d G (x, y) > π(π+1)(1 λ) 24λ ϵ then Theorem 3.3 11: E E \ {(x, y)} 12: return (V, E) |
| Open Source Code | Yes | Our code for ORC-MANL and all experiments are available on Git Hub. Link to implementation: https://github.com/Tristan Saidi/orcml |
| Open Datasets | Yes | We evaluate the efficacy of ORC-MANL on real-world data by using it to analyze cell-type annotated sc RNAseq data of (1) anterolateral motor cortex (ALM) brain cells in mice available from the Allen Institute (Abdelaal et al., 2019), and (2) sc RNAseq data of peripheral blood mononuclear cells (PBMC) available from 10XGenomics. ... we include experiments in Appendix A.5.3 that suggest that ORC-MANL is effective on MNIST (Deng, 2012) and KMNIST (Clanuwat et al., 2018) |
| Dataset Splits | No | For all experiments in Section 4.1, datapoints are sampled according to ρ defined in eq. (1) for each synthetic manifold M. To achieve this, we sample uniformly from the underlying m-dimensional manifold M (unless otherwise specified) driven by the m-dimensional volume form defined by the Euclidean metric. ... for both datasets we extract the 2000 most variable genes, followed by PCA (Pearson, 1901) to obtain a 50-dimensional representation of the original data. ... We also use k = 200 and n = 4000 (where n is the number of points in the sample) for the experiments in Figure 7. The paper describes data generation and preprocessing but does not explicitly provide details about training/validation/test splits. |
| Hardware Specification | Yes | For empirical evidence to support this claim, we provide experiments that compare the wall-clock time of ORC-MANL and UMAP for the Swiss Roll in Figure 10. For this experiment, both algorithms were run on a 16-core machine with 187 GB of RAM. |
| Software Dependencies | Yes | We use the UMAP implementation from Mc Innes et al. (2018b) and the ORC implementation from Ni et al. (2019), both of which use multiprocessing. ... We present results for persistent homology applied to nearest neighbor graphs with and without ORC-MANL preprocessing in Figure 6. ... We use The GUDHI Project (2020) to compute persistence distances with p = . |
| Experiment Setup | Yes | Our algorithm parameters for ORC-MANL are fixed at δ = 0.8 and λ = 0.01 across all manifolds and trials (and δ for ORC ONLY is similarly fixed at 0.8) in Table 1. ... We also use k = 200 and n = 4000 (where n is the number of points in the sample) for the experiments in Figure 7. ... In our experiments in Figure 7 we use n = 4000 points and rmax = 50 for the swiss roll and rmax = 5 for the adjacent spheres. ... for both datasets we extract the 2000 most variable genes, followed by PCA (Pearson, 1901) to obtain a 50-dimensional representation of the original data. ... Parameters are set to k = 20, δ = 0.8, and λ = 0.01 unless the said parameter is being varied. |