Using Meta-mining to Support Data Mining Workflow Planning and Optimization
Authors: P. Nguyen, M. Hilario, A. Kalousis
JAIR 2014 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the quality of the data mining workflows that the system produces on a collection of real world datasets coming from biology and show that it produces workflows that are significantly better than alternative methods that can only do workflow selection and not planning. |
| Researcher Affiliation | Academia | Phong Nguyen EMAIL Melanie Hilario EMAIL Department of Computer Science University of Geneva Switzerland Alexandros Kalousis EMAIL Department of Business Informatics University of Applied Sciences Western Switzerland, and Department of Computer Science University of Geneva Switzerland |
| Pseudocode | No | The paper does not contain any sections explicitly labeled as 'Pseudocode' or 'Algorithm', nor does it present any structured, code-like algorithmic blocks. |
| Open Source Code | No | The paper mentions external tools and projects like the Rapid Miner platform (Klinkenberg, Mierswa, & Fischer, 2007) and the e-LICO project (http://www.e-lico.eu, http://www.e-lico.eu/eproplan.html), which are used or related to the work. However, there is no explicit statement from the authors about making the source code for their specific methodology publicly available, nor is there a direct link to their own code repository. |
| Open Datasets | Yes | To construct the base-level experiments, we have collected 65 real world datasets on genomic microarray or proteomic data related to cancer diagnosis or prognosis, mostly from The National Center for Biotechnology Information5... Footnote 5: http://www.ncbi.nlm.nih.gov/ |
| Dataset Splits | Yes | The performance measure we use is accuracy which we estimate using ten-fold cross-validation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or memory specifications. It only mentions the use of the Rapid Miner DM suite for algorithm implementations. |
| Software Dependencies | Yes | Today s second generation knowledge discovery support systems (KDSS) allow complex modeling of workflows and contain several hundreds of operators; the Rapid Miner platform (Klinkenberg, Mierswa, & Fischer, 2007), in its extended version with Weka (Hall et al., 2009) and R (R Core Team, 2013), proposes actually more than 500 operators... For all algorithms, we used the implementations provided in the Rapid Miner DM suite (Klinkenberg et al., 2007). |
| Experiment Setup | Yes | When the planning goal g is the classification task, we will use as evaluation measure in our experiments the classification accuracy, estimated by ten-fold cross-validation, and do the significance testing using Mc Nemar s test, with a significance level of 0.05... For the two meta-learning methods, we fixed the number N of nearest neighbors to five... For planning, we set manually the dataset kernel width parameter to τ x k = 0.04 and the workflow kernel width parameter to τ w k = 0.08... |