Improved Learning via k-DTW: A Novel Dissimilarity Measure for Curves
Authors: Amer Krivošija, Alexander Munteanu, André Nusser, Chris Schwiegelshohn
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We complement our theoretical findings with an experimental illustration of the benefits of using k-DTW for clustering and nearest neighbor classification. [...] In this section, we conduct experiments using our novel k-DTW dissimilarity measure to understand the effect of the increased robustness compared to the Fr echet distance and the improved relaxed triangle inequality compared to DTW in practice.3 To this end, we create a synthetic data set that highlights the properties of the different dissimilarity measures in agglomerative clustering. Subsequently, we perform l-nearest neighbor classification on several real-world data sets. |
| Researcher Affiliation | Academia | 1TU Dortmund, Germany 2University of Cologne, Germany 3Universit e Cˆote d Azur, CNRS, Inria, France 4Aarhus University, Denmark. |
| Pseudocode | Yes | Algorithm 1 Computing the k-DTW distance 1: Input: Curves σ = (v1 . . . , vm ) and τ = (w1, . . . , wm ), parameter k 2: Output: The k-DTW distance dk-DTW(σ, τ) 3: Initialize the distance matrix D[1..m , 1..m ] with D[i, j] vi wj 4: Let array E[1..z + 1] store the z distinct distances in D including 0 in sorted order E[1] > . . . > E[z] > E[z + 1] = 0 5: Initialize mincost 6: for l {1, . . . , z + 1} do 7: /* E[l] represents a current guess on the smallest element among the largest k distances */ 8: D [i, j] max{D[i, j] E[l], 0}, for all (i, j) [m ] [m ] 9: /* update best solution by the DTW distance on the modified D matrix plus k E[l] */ 10: mincost min{mincost, DTW(D ) + k E[l]} 11: end for 12: Return: mincost |
| Open Source Code | Yes | Our Python code is available at https://github.com/akrivosija/k DTW. |
| Open Datasets | Yes | We use real-world data from the Open University Learning Analytics Dataset (OULAD) (Kuzilek et al., 2017) and the datasets from (Aghababa & Phillips, 2023). [...] Cars+Bus (Cruz et al., 2016) [...] Characters dataset (Williams, 2008) [...] Two Persons (UIC, 2006) |
| Dataset Splits | Yes | We run a 100 times repeated 6-fold cross validation, using l-NN with a standard choice of l = √n = 17 (cf. Devroye et al., 2013). [...] Specifically, we randomly split the OULAD dataset (Kuzilek et al., 2017) into training and test sets A and B, respectively, where the test set comprises a 1/3, 1/5 fraction of the input. We find the best value of k on A by running 100 independent repetitions of 6-fold cross-validation, using l = √|A| for the l-NN training and evaluate it on B. |
| Hardware Specification | Yes | All experiments were run on a HPC workstation with AMD Ryzen Threadripper PRO 5975WX, 32 cores at 3.6GHz, 512GB DDR4-3200. |
| Software Dependencies | No | Our Python code is available at https://github.com/akrivosija/k DTW. -> This only mentions 'Python code' but no specific version or libraries, which is required for a reproducible description of ancillary software. |
| Experiment Setup | Yes | We run a 100 times repeated 6-fold cross validation, using l-NN with a standard choice of l = √n = 17 (cf. Devroye et al., 2013). We perform an exponential search for the best parameter k ∈ {2i | i ∈ [8]} for the k-DTW distance, and a subsequent finer search. The best results were obtained for k ∈ {64, 76}, which amounts to roughly 20% − 25% of the curves complexity. |