The weight function in the subtree kernel is decisive

Authors: Romain Azaïs, Florian Ingels

JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section is dedicated to the application of the methodology developed in the paper to eight real data sets with various characteristics in order to show its strengths and weaknesses. The related questions are supervised classification problems. As mentioned in Subsection 3.3, our approach consists in computing the Gram matrices of the subtree kernel via DAG reduction and with a new weight function called the discriminance (see Section 4). In particular, we aim to compare the usual exponential weight of the literature and the latter in terms of prediction capability. In all the sequel, the Gram matrices are used as inputs to SVM algorithms in order to tackle these classification problems.
Researcher Affiliation Academia Romain Aza ıs EMAIL Laboratoire Reproduction et D eveloppement des Plantes Univ Lyon, ENS de Lyon, UCB Lyon 1, CNRS, INRAE, Inria, F-69342, Lyon, France
Pseudocode Yes Algorithm 1: Dag Recompression
Open Source Code Yes The treex library for Python (Aza ıs et al., 2019) is designed to manipulate rooted trees... We implemented the subtree kernel as a module of treex so that the interested reader can manipulate the concepts discussed in this paper in a ready-to-use manner. ...Installing instructions and the documentation of treex can be found from Aza ıs et al. (2019).
Open Datasets Yes For each data set, and each language, we picked Wikipedia articles at random using the Wikipedia API... INEX 2005 and 2006 These data sets originate from the INEX competition (Denoyer and Gallinari, 2007)... The Vascusynth data set from Hamarneh and Jassi (2010); Jassi and Hamarneh (2011)... From the encoding of the data that they have provided as a supplementary material3, we have extracted ordered unlabeled trees... 3https://doi.org/10.1101/267450 (last accessed in April 2020)... Faure et al. (2015) have developed a method to construct cell lineage trees from microscopy and provided their data online4. ...4https://bioemergences.eu/bioemergences/openworkflow-datasets.php (last accessed in April 2020)... The LOGML data set is made of user sessions on an academic website, namely the Rensselaer Polytechnic Institute Computer Science Department website5... 5https://science.rpi.edu/computer-science (last accessed in April 2020)
Dataset Splits Yes In a second time, we evaluated the performance of the subtree kernel on a classification task via two methods: (i) for exponential weights τ 7 λH(τ) we randomly split the data in thirds, two for training a SVM, and one for prediction; (ii) for discriminance weight, we also randomly split the data in thirds, one for training the discriminance weight, one for training a SVM, and the last one for prediction.
Hardware Specification No No specific hardware details (like GPU/CPU models, memory, or specific computing infrastructure) are mentioned in the paper for the experiments conducted. General terms such as 'SVM algorithms' are used without specifying the underlying hardware.
Software Dependencies No We used the implementation available in the scikit-learn Python library, via the two functions accuracy_score and precision_recall_fscore_support. ... The treex library for Python (Aza ıs et al., 2019)... Resorting to dependencies to scikit-learn, tools for processing databases and compute SVM are also provided for the sake of self-containedness.
Experiment Setup No In all the sequel, the Gram matrices are used as inputs to SVM algorithms in order to tackle these classification problems. ... for exponential weights τ 7 λH(τ) we randomly split the data in thirds, two for training a SVM, and one for prediction; (ii) for discriminance weight, we also randomly split the data in thirds, one for training the discriminance weight, one for training a SVM, and the last one for prediction. We repeated 50 times this random split for discriminance, and for different values of λ. ... In the sequel, we chose ων = f (1 δν) with the smoothstep function f : x 7 3x2 2x3. While the paper mentions the use of SVM algorithms and specifies the smoothstep function for discriminance weight, it lacks concrete hyperparameters for the SVM itself (e.g., C, kernel parameters, specific solver settings, learning rates, etc.).