reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

NetSDM: Semantic Data Mining with Network Analysis

Authors: Jan Kralj, Marko Robnik-Sikonja, Nada Lavrac

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental evaluation of the Net SDM methodology on acute lymphoblastic leukemia and breast cancer data demonstrates that Net SDM achieves radical time efficiency improvements and that learned rules are comparable or better than the rules obtained by the original SDM algorithms.
Researcher Affiliation	Academia	Jan Kralj EMAIL Jožef Stefan Institute, Department of Knowledge Technologies, Jamova 39, 1000 Ljubljana, Slovenia Marko Robnik-Šikonja EMAIL University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, 1000 Ljubljana, Slovenia Nada Lavrač EMAIL Jožef Stefan Institute, Department of Knowledge Technologies, Jamova 39, 1000 Ljubljana, Slovenia
Pseudocode	Yes	Algorithm 1: The Net SDM algorithm, implementing the proposed approach to semantic data mining with network node ranking and ontology shrinking. Algorithm 2: The algorithm for removing a node from a network, obtained through direct conversion of the background knowledge into the information network format.
Open Source Code	No	The paper does not provide concrete access to its own source code for the Net SDM methodology. It mentions the Aleph manual link, but this is for a third-party tool, not the authors' implementation for this paper.
Open Datasets	Yes	ALL (acute lymphoblastic leukemia) data. The ALL data set, introduced by Chiaretti et al. (2004), is a typical dataset for medical research. Breast cancer data. The breast cancer data set, introduced by Sotiriou et al. (2006), contains gene expression data on patients suffering from breast cancer. ...Gene Ontology (Ashburner et al., 2000), which was used as the background knowledge in our experiments.
Dataset Splits	No	The paper mentions subsets of genes (e.g., "1,000 enriched genes... from a set of 10,000 genes" and "990 interesting genes out of a total of 12,019 genes") and distinguishes between positive and negative examples, but it does not specify explicit training, validation, or test dataset splits, percentages, or methodology for reproducibility beyond these overall descriptions of target sets.
Hardware Specification	Yes	We timed the algorithm on the ALL data set using different settings for the beam, depth and support on 8 core 2.60 GHz Intel Xeon(R)E5-2697 v3 machine with 64GB of RAM.
Software Dependencies	No	The paper mentions the use of Hedwig and Aleph algorithms, and refers to the 'Aleph Manual, 1999'. However, it does not provide specific version numbers for the implementations of these algorithms or any other key software libraries used in their experiments.
Experiment Setup	Yes	Using Hedwig, we ran the algorithm with all combinations of depth (1 or 10), beam width (1 or 10) and support (0.1 or 0.01). For Aleph, we ran the algorithm using the settings recommended by the algorithm author minimum number of positive examples covered by a rule was set to 10, and maximum number of negative examples covered by a rule was set to 100.