reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Kernels for Sequentially Ordered Data

Authors: Franz J. Kiraly, Harald Oberhauser

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform two experiments to validate the practical usefulness of the signature kernels: (1) On a real world data set of hand movement classiﬁcation (eponymous UCI data set Sapsanis et al. (2013)), we show the discretized signature kernel outperforms the best previously reported predictive performance Sapsanis et al. (2013), as well as non-sequential kernel and aggregate baselines. (2) On a real world data set on hand written digit recognition (pendigits), we show that the discretized signature kernel over the Euclidean kernel (= linear use of signature features) achieves only sub-baseline performance. Using the discretized signature kernel over a Gaussian kernel improves prediction accuracy to the baseline region.
Researcher Affiliation	Academia	Franz J. Kir aly EMAIL Department of Statistical Science University College London London WC1E 6BT, United Kingdom Harald Oberhauser EMAIL Mathematical Institute University of Oxford Oxford OX2 6GG, United Kingdom
Pseudocode	Yes	Algorithm 1 Computing the cumulative sum of a vector. Algorithm 2 Computing the cumulative sum of an array. Algorithm 3 Evaluation of k+ m. Algorithm 4 Evaluation of k+ m, with low-rank speed-up. Algorithm 5 Computation of the Gram matrix of k+ m, with (double) low-rank speed-up. Algorithm 6 Evaluation of k+ (d,m).
Open Source Code	No	Section 7 presents a Numpy implementation and basic benchmarks. However, the paper does not explicitly state that the code is open-source or provide a link to a repository.
Open Datasets	Yes	We performed classiﬁcation with the eps-support vector machine (SVC) on the hand movements data set from UCI Sapsanis et al. (2013). We performed classiﬁcation on the pendgits data set from the UCI repository16. It contains 10992 samples of digits between 0 and 9 written by 44 diﬀerent writers with a digital pen on a tablet. One sample consists of a pair of horizontal and vertical coordinates of sampled at 8 diﬀerent time points, hence we deal with a sequence in X8 with X = R2. The data set comes with a pre-speciﬁed training fold of 7494 samples, and a test fold of 3498 samples.
Dataset Splits	Yes	In all experiments, we use nested (double) cross-validation for parameter tuning (inner loop) and estimation of error metrics (outer loop). In both instances of cross-validation, we perform uniform 5-fold cross-validation. The data set comes with a pre-speciﬁed training fold of 7494 samples, and a test fold of 3498 samples.
Hardware Specification	No	The paper mentions 'contemporary desktop computers' in Section 6.3 but does not provide specific hardware details (e.g., CPU, GPU models, memory, etc.) for running experiments.
Software Dependencies	No	For prediction, we use eps-support vector classiﬁcation (as available in the python/scikitlearn package). Section 7 presents a Numpy implementation and basic benchmarks. The paper mentions 'python/scikit-learn' and 'Numpy' but does not provide specific version numbers for these software components.
Experiment Setup	Yes	In all experiments, we use nested (double) cross-validation for parameter tuning (inner loop) and estimation of error metrics (outer loop). In both instances of cross-validation, we perform uniform 5-fold cross-validation. Unless stated otherwise, parameters are tuned on the tuning grid given in Table 3 (when applicable). Kernel parameters are the same as in the above section prediction mehods . The best parameter is selected by 5-fold cross-validation, as the parameter yielding the minimum test-f1-score, averaged over the ﬁve folds.