NetSDM: Semantic Data Mining with Network Analysis
Authors: Jan Kralj, Marko Robnik-Sikonja, Nada Lavrac
JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental evaluation of the Net SDM methodology on acute lymphoblastic leukemia and breast cancer data demonstrates that Net SDM achieves radical time efficiency improvements and that learned rules are comparable or better than the rules obtained by the original SDM algorithms. |
| Researcher Affiliation | Academia | Jan Kralj EMAIL Jožef Stefan Institute, Department of Knowledge Technologies, Jamova 39, 1000 Ljubljana, Slovenia Marko Robnik-Šikonja EMAIL University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, 1000 Ljubljana, Slovenia Nada Lavrač EMAIL Jožef Stefan Institute, Department of Knowledge Technologies, Jamova 39, 1000 Ljubljana, Slovenia |
| Pseudocode | Yes | Algorithm 1: The Net SDM algorithm, implementing the proposed approach to semantic data mining with network node ranking and ontology shrinking. Algorithm 2: The algorithm for removing a node from a network, obtained through direct conversion of the background knowledge into the information network format. |
| Open Source Code | No | The paper does not provide concrete access to its own source code for the Net SDM methodology. It mentions the Aleph manual link, but this is for a third-party tool, not the authors' implementation for this paper. |
| Open Datasets | Yes | ALL (acute lymphoblastic leukemia) data. The ALL data set, introduced by Chiaretti et al. (2004), is a typical dataset for medical research. Breast cancer data. The breast cancer data set, introduced by Sotiriou et al. (2006), contains gene expression data on patients suffering from breast cancer. ...Gene Ontology (Ashburner et al., 2000), which was used as the background knowledge in our experiments. |
| Dataset Splits | No | The paper mentions subsets of genes (e.g., "1,000 enriched genes... from a set of 10,000 genes" and "990 interesting genes out of a total of 12,019 genes") and distinguishes between positive and negative examples, but it does not specify explicit training, validation, or test dataset splits, percentages, or methodology for reproducibility beyond these overall descriptions of target sets. |
| Hardware Specification | Yes | We timed the algorithm on the ALL data set using different settings for the beam, depth and support on 8 core 2.60 GHz Intel Xeon(R)E5-2697 v3 machine with 64GB of RAM. |
| Software Dependencies | No | The paper mentions the use of Hedwig and Aleph algorithms, and refers to the 'Aleph Manual, 1999'. However, it does not provide specific version numbers for the implementations of these algorithms or any other key software libraries used in their experiments. |
| Experiment Setup | Yes | Using Hedwig, we ran the algorithm with all combinations of depth (1 or 10), beam width (1 or 10) and support (0.1 or 0.01). For Aleph, we ran the algorithm using the settings recommended by the algorithm author minimum number of positive examples covered by a rule was set to 10, and maximum number of negative examples covered by a rule was set to 100. |