Infinite-dimensional Mahalanobis Distance with Applications to Kernelized Novelty Detection
Authors: Nikita Zozoulenko, Thomas Cass, Lukas Gonon
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In an empirical study on 12 real-world data sets, we demonstrate that the kernelized nearest-neighbour Mahalanobis distance outperforms the traditional kernelized Mahalanobis distance for multivariate time series novelty detection, using state-of-the-art time series kernels such as the signature, global alignment, and Volterra reservoir kernels. |
| Researcher Affiliation | Academia | Nikita Zozoulenko EMAIL Department of Mathematics Imperial College London UK Thomas Cass EMAIL Department of Mathematics Imperial College London UK Institute for Advanced Study Princeton, USA Lukas Gonon EMAIL School of Computer Science University of St. Gallen Switzerland Department of Mathematics Imperial College London UK |
| Pseudocode | Yes | The full procedure for computing the kernelized Mahalanobis distance and its nearest-neighbour variant (conformance score) is detailed in Algorithms 1 to 3. |
| Open Source Code | Yes | Our experiment code is publicly available at https://github. com/nikitazozoulenko/kernel-timeseries-anomaly-detection. |
| Open Datasets | Yes | We will use UEA multivariate time series repository (Bagnall et al., 2018; Ruiz et al., 2021) in our experimentation, which in recent years has become a standard benchmark for multivariate time series classification. |
| Dataset Splits | Yes | The optimal kernel hyper-parameters were obtained separately for the Mahalanobis distance and the conformance score, via a 10 times repeated 4-fold cross-validation on the training data for each data set. The objective score used in the cross-validation was the sum of ROC-AUC and PR-AUC. When calculating the precision-recall metric, we let the non-outlier class be the positive class. The final model was then evaluated on the out-of-sample test set to obtain the final results, presented in Table 2 and Table 3. |
| Hardware Specification | No | In our open-source code we provide efficient Py Torch implementations of each kernel on both GPU and CPU, as well as an implementation of Algorithms 1 to 3 for computing the kernelized Mahalanobis distance and the kernelized conformance score. We acknowledge computational resources and support provided by the Imperial College Research Computing Service (DOI: 10.14469/hpc/2232). |
| Software Dependencies | No | In our open-source code we provide efficient Py Torch implementations of each kernel on both GPU and CPU, as well as an implementation of Algorithms 1 to 3 for computing the kernelized Mahalanobis distance and the kernelized conformance score. |
| Experiment Setup | Yes | For each kernel, data set, and class label, we run an extensive grid search on the designated training set using repeated k-fold cross-validation with 4 folds and 10 repeats to find the optimal kernel hyper-parameters. Let Rd be the state-space, and let T be the length of the time series for a given data set. For each method using the RBF static kernel we use the range σ 1/d{e 2, e 1, 1, e1, e2}, and similarly for the polynomial kernel we use p {2, 3, 4}, c {2, 1, 2, 4}. For the GAK kernel we use the previously specified σ without the d term, multiplied by T med(x y) as is recommended by Cuturi (2011). For the VRK kernel we use τ {1/2, 1}, and we let λ vary from 0.25 to 0.999 on an inverse logarithmic grid of size 10. The signature kernels inherit hyper-parameters from their respective static kernels. We additionally scale the kernel-lifted paths by s {1/2, 1, 2, 4} for the truncated signature, and s {1/4, 1/2, 1} for the untruncated PDE kernel. We use lower values for the PDE signature kernel since untruncated signatures essentially can be viewed as tensor exponentials, whose inner products will blow up if the input values are too big. For the truncated signature kernel we let the truncation level be in {1, 2, 3, 4, 5, 6, 7}. For the randomized signature we use the tanh activation function with number of features in {10, 25, 50, 100, 200}, and random matrix variances taken from a logarithmic grid of size 8 from 0.00001 to 1. Since the randomized signature is a randomized kernel, we perform the cross validation with 5 different random seeds for the random matrix initializations, and take the best performing model (using the training set only), as is common practice for randomized methods. Furthermore, for each method we also cross-validate over the Tikhonov regularization parameter α {10 8, 10 5, 10 2}, whether to concatenate time as an additional dimension to each time series, and the eigenvalue threshold λ in Algorithm 1. We set an upper limit of 50 eigenvalues for the computation of the variance norm. For numerical stability, and to make the choices of α and λ be comparable across all kernels and data sets, we normalize all time series kernels K in feature space via K(x,y)/K(x,x)K(y,y). |