reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Mining Topological Structure in Graphs through Forest Representations

Authors: Robin Vandaele, Yvan Saeys, Tijl De Bie

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We qualitatively and quantitatively conﬁrm the applicability, eﬀectiveness, and scalability of our method for discovering backbones in a variety of graph-structured data, such as social networks, earthquake locations scattered across the Earth, and high-dimensional cell trajectory data. 3. Experiments In this section, we show how our method is applicable to a wide variety of graphs arising from diﬀerent ﬁelds of science. For our applications, we will focus on topological data analysis, visualization, and graph simpliﬁcations. Table 2 summarizes the network sizes for each graph G (n = \|V (G)\|, m = \|E(G)\|), runtimes, and our quantitative results for the metrics introduced in Section 3.4.1.
Researcher Affiliation	Academia	Robin Vandaele1,2 EMAIL Yvan Saeys2 EMAIL Tijl De Bie1 Tijl.De EMAIL 1IDLab, Department of Electronics and Information Systems Ghent University Technologiepark-Zwijnaarde 19, 9052 Gent, Belgium 2Data mining and Modelling for Biomedicine (Da MBi), VIB Inﬂammation Research Center Technologiepark-Zwijnaarde 927, 9052 Gent, Belgium
Pseudocode	Yes	Appendix B. Algorithms and Computational Costs In this section, we analyze the computational cost and provide basic pseudocode for the algorithms used in our method for topological data analysis of graph-structured data sets. Algorithm 1: Computing the boundary coeﬃcients through sparse matrices. Algorithm 2: Computing the f-pine. Algorithm 3: Constrained Leaves Optimal sub Forest (CLOF) in trees. Algorithm 4: Constrained Leaves Optimal sub Forest (CLOF) in forests.
Open Source Code	No	The paper does not provide concrete access to its own source code. It mentions `https://github.com/dynverse` for a wrapper for the Slingshot method (a baseline), and `https://zenodo.org/record/1443566# .Xab5de Yza02` for datasets, but no explicit statement or link for the authors' own implementation code.
Open Datasets	Yes	We obtained a data set containing information on 80 549 earthquakes ranging between the years 1950 and 2017. This data is freely accessible from USGS Earthquake Search. The data used to construct this graph is publicly available on aminer.org/citation. The original graph can be found at github.com/hzjken/character-network... obtainable from https://shiring.github.io/ networks/2017/05/15/got_final... We generated a synthetic data set of 1000 points... all of which (including those above) may be obtained from https://zenodo.org/record/1443566# .Xab5de Yza02.
Dataset Splits	No	The paper describes the creation of kNN graphs and the estimation of parameters for its method, but does not specify train/test/validation splits for a supervised learning context or for reproducibility of data partitioning. For instance, in Section 3.4.3, it states: "We evaluated our method over multiple values k {5, 10, 15, 20, 25} for building k NN graphs from our diﬀusion coordinates" but does not detail data splits.
Hardware Specification	Yes	All these results were obtained using non-optimized R code on a machine equipped with an Intel R Core TM i7 processor at 2.6GHz and 8GB of RAM.
Software Dependencies	No	The paper mentions that results were obtained using "non-optimized R code" and refers to the "brainwaver library in R" when discussing existing methods. However, it does not provide specific version numbers for R or any of the libraries used in their own implementation, which is required for reproducibility.
Experiment Setup	Yes	Each of our 333 considered data sets described in Section 3.1 was embedded in a 20 dimensional space using diﬀusion maps with Pearson correlations and standard settings in R. We evaluated our method over multiple values k {5, 10, 15, 20, 25} for building k NN graphs from our diﬀusion coordinates, using the Euclidean distance between points. A number of leaves l, 2 l 30, was estimated for each connected component in our BC-pine, by minimizing the second-order ﬁnite diﬀerences of the function mapping the number of leaves to the corresponding vertex betweenness cost, as discussed in Section 2.4.2.