reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The weight function in the subtree kernel is decisive

Authors: Romain Azaïs, Florian Ingels

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section is dedicated to the application of the methodology developed in the paper to eight real data sets with various characteristics in order to show its strengths and weaknesses. The related questions are supervised classiﬁcation problems. As mentioned in Subsection 3.3, our approach consists in computing the Gram matrices of the subtree kernel via DAG reduction and with a new weight function called the discriminance (see Section 4). In particular, we aim to compare the usual exponential weight of the literature and the latter in terms of prediction capability. In all the sequel, the Gram matrices are used as inputs to SVM algorithms in order to tackle these classiﬁcation problems.
Researcher Affiliation	Academia	Romain Aza ıs EMAIL Laboratoire Reproduction et D eveloppement des Plantes Univ Lyon, ENS de Lyon, UCB Lyon 1, CNRS, INRAE, Inria, F-69342, Lyon, France
Pseudocode	Yes	Algorithm 1: Dag Recompression
Open Source Code	Yes	The treex library for Python (Aza ıs et al., 2019) is designed to manipulate rooted trees... We implemented the subtree kernel as a module of treex so that the interested reader can manipulate the concepts discussed in this paper in a ready-to-use manner. ...Installing instructions and the documentation of treex can be found from Aza ıs et al. (2019).
Open Datasets	Yes	For each data set, and each language, we picked Wikipedia articles at random using the Wikipedia API... INEX 2005 and 2006 These data sets originate from the INEX competition (Denoyer and Gallinari, 2007)... The Vascusynth data set from Hamarneh and Jassi (2010); Jassi and Hamarneh (2011)... From the encoding of the data that they have provided as a supplementary material3, we have extracted ordered unlabeled trees... 3https://doi.org/10.1101/267450 (last accessed in April 2020)... Faure et al. (2015) have developed a method to construct cell lineage trees from microscopy and provided their data online4. ...4https://bioemergences.eu/bioemergences/openworkflow-datasets.php (last accessed in April 2020)... The LOGML data set is made of user sessions on an academic website, namely the Rensselaer Polytechnic Institute Computer Science Department website5... 5https://science.rpi.edu/computer-science (last accessed in April 2020)
Dataset Splits	Yes	In a second time, we evaluated the performance of the subtree kernel on a classiﬁcation task via two methods: (i) for exponential weights τ 7 λH(τ) we randomly split the data in thirds, two for training a SVM, and one for prediction; (ii) for discriminance weight, we also randomly split the data in thirds, one for training the discriminance weight, one for training a SVM, and the last one for prediction.
Hardware Specification	No	No specific hardware details (like GPU/CPU models, memory, or specific computing infrastructure) are mentioned in the paper for the experiments conducted. General terms such as 'SVM algorithms' are used without specifying the underlying hardware.
Software Dependencies	No	We used the implementation available in the scikit-learn Python library, via the two functions accuracy_score and precision_recall_fscore_support. ... The treex library for Python (Aza ıs et al., 2019)... Resorting to dependencies to scikit-learn, tools for processing databases and compute SVM are also provided for the sake of self-containedness.
Experiment Setup	No	In all the sequel, the Gram matrices are used as inputs to SVM algorithms in order to tackle these classiﬁcation problems. ... for exponential weights τ 7 λH(τ) we randomly split the data in thirds, two for training a SVM, and one for prediction; (ii) for discriminance weight, we also randomly split the data in thirds, one for training the discriminance weight, one for training a SVM, and the last one for prediction. We repeated 50 times this random split for discriminance, and for diﬀerent values of λ. ... In the sequel, we chose ων = f (1 δν) with the smoothstep function f : x 7 3x2 2x3. While the paper mentions the use of SVM algorithms and specifies the smoothstep function for discriminance weight, it lacks concrete hyperparameters for the SVM itself (e.g., C, kernel parameters, specific solver settings, learning rates, etc.).