Mining Topological Structure in Graphs through Forest Representations
Authors: Robin Vandaele, Yvan Saeys, Tijl De Bie
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We qualitatively and quantitatively confirm the applicability, effectiveness, and scalability of our method for discovering backbones in a variety of graph-structured data, such as social networks, earthquake locations scattered across the Earth, and high-dimensional cell trajectory data. 3. Experiments In this section, we show how our method is applicable to a wide variety of graphs arising from different fields of science. For our applications, we will focus on topological data analysis, visualization, and graph simplifications. Table 2 summarizes the network sizes for each graph G (n = |V (G)|, m = |E(G)|), runtimes, and our quantitative results for the metrics introduced in Section 3.4.1. |
| Researcher Affiliation | Academia | Robin Vandaele1,2 EMAIL Yvan Saeys2 EMAIL Tijl De Bie1 Tijl.De EMAIL 1IDLab, Department of Electronics and Information Systems Ghent University Technologiepark-Zwijnaarde 19, 9052 Gent, Belgium 2Data mining and Modelling for Biomedicine (Da MBi), VIB Inflammation Research Center Technologiepark-Zwijnaarde 927, 9052 Gent, Belgium |
| Pseudocode | Yes | Appendix B. Algorithms and Computational Costs In this section, we analyze the computational cost and provide basic pseudocode for the algorithms used in our method for topological data analysis of graph-structured data sets. Algorithm 1: Computing the boundary coefficients through sparse matrices. Algorithm 2: Computing the f-pine. Algorithm 3: Constrained Leaves Optimal sub Forest (CLOF) in trees. Algorithm 4: Constrained Leaves Optimal sub Forest (CLOF) in forests. |
| Open Source Code | No | The paper does not provide concrete access to its own source code. It mentions `https://github.com/dynverse` for a wrapper for the Slingshot method (a baseline), and `https://zenodo.org/record/1443566# .Xab5de Yza02` for datasets, but no explicit statement or link for the authors' own implementation code. |
| Open Datasets | Yes | We obtained a data set containing information on 80 549 earthquakes ranging between the years 1950 and 2017. This data is freely accessible from USGS Earthquake Search. The data used to construct this graph is publicly available on aminer.org/citation. The original graph can be found at github.com/hzjken/character-network... obtainable from https://shiring.github.io/ networks/2017/05/15/got_final... We generated a synthetic data set of 1000 points... all of which (including those above) may be obtained from https://zenodo.org/record/1443566# .Xab5de Yza02. |
| Dataset Splits | No | The paper describes the creation of kNN graphs and the estimation of parameters for its method, but does not specify train/test/validation splits for a supervised learning context or for reproducibility of data partitioning. For instance, in Section 3.4.3, it states: "We evaluated our method over multiple values k {5, 10, 15, 20, 25} for building k NN graphs from our diffusion coordinates" but does not detail data splits. |
| Hardware Specification | Yes | All these results were obtained using non-optimized R code on a machine equipped with an Intel R Core TM i7 processor at 2.6GHz and 8GB of RAM. |
| Software Dependencies | No | The paper mentions that results were obtained using "non-optimized R code" and refers to the "brainwaver library in R" when discussing existing methods. However, it does not provide specific version numbers for R or any of the libraries used in their own implementation, which is required for reproducibility. |
| Experiment Setup | Yes | Each of our 333 considered data sets described in Section 3.1 was embedded in a 20 dimensional space using diffusion maps with Pearson correlations and standard settings in R. We evaluated our method over multiple values k {5, 10, 15, 20, 25} for building k NN graphs from our diffusion coordinates, using the Euclidean distance between points. A number of leaves l, 2 l 30, was estimated for each connected component in our BC-pine, by minimizing the second-order finite differences of the function mapping the number of leaves to the corresponding vertex betweenness cost, as discussed in Section 2.4.2. |