reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improved Learning via k-DTW: A Novel Dissimilarity Measure for Curves

Authors: Amer Krivošija, Alexander Munteanu, André Nusser, Chris Schwiegelshohn

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We complement our theoretical findings with an experimental illustration of the benefits of using k-DTW for clustering and nearest neighbor classification. [...] In this section, we conduct experiments using our novel k-DTW dissimilarity measure to understand the effect of the increased robustness compared to the Fr echet distance and the improved relaxed triangle inequality compared to DTW in practice.3 To this end, we create a synthetic data set that highlights the properties of the different dissimilarity measures in agglomerative clustering. Subsequently, we perform l-nearest neighbor classification on several real-world data sets.
Researcher Affiliation	Academia	1TU Dortmund, Germany 2University of Cologne, Germany 3Universit e Cˆote d Azur, CNRS, Inria, France 4Aarhus University, Denmark.
Pseudocode	Yes	Algorithm 1 Computing the k-DTW distance 1: Input: Curves σ = (v1 . . . , vm ) and τ = (w1, . . . , wm ), parameter k 2: Output: The k-DTW distance dk-DTW(σ, τ) 3: Initialize the distance matrix D[1..m , 1..m ] with D[i, j] vi wj 4: Let array E[1..z + 1] store the z distinct distances in D including 0 in sorted order E[1] > . . . > E[z] > E[z + 1] = 0 5: Initialize mincost 6: for l {1, . . . , z + 1} do 7: /* E[l] represents a current guess on the smallest element among the largest k distances / 8: D [i, j] max{D[i, j] E[l], 0}, for all (i, j) [m ] [m ] 9: / update best solution by the DTW distance on the modified D matrix plus k E[l] */ 10: mincost min{mincost, DTW(D ) + k E[l]} 11: end for 12: Return: mincost
Open Source Code	Yes	Our Python code is available at https://github.com/akrivosija/k DTW.
Open Datasets	Yes	We use real-world data from the Open University Learning Analytics Dataset (OULAD) (Kuzilek et al., 2017) and the datasets from (Aghababa & Phillips, 2023). [...] Cars+Bus (Cruz et al., 2016) [...] Characters dataset (Williams, 2008) [...] Two Persons (UIC, 2006)
Dataset Splits	Yes	We run a 100 times repeated 6-fold cross validation, using l-NN with a standard choice of l = √n = 17 (cf. Devroye et al., 2013). [...] Specifically, we randomly split the OULAD dataset (Kuzilek et al., 2017) into training and test sets A and B, respectively, where the test set comprises a 1/3, 1/5 fraction of the input. We find the best value of k on A by running 100 independent repetitions of 6-fold cross-validation, using l = √\|A\| for the l-NN training and evaluate it on B.
Hardware Specification	Yes	All experiments were run on a HPC workstation with AMD Ryzen Threadripper PRO 5975WX, 32 cores at 3.6GHz, 512GB DDR4-3200.
Software Dependencies	No	Our Python code is available at https://github.com/akrivosija/k DTW. -> This only mentions 'Python code' but no specific version or libraries, which is required for a reproducible description of ancillary software.
Experiment Setup	Yes	We run a 100 times repeated 6-fold cross validation, using l-NN with a standard choice of l = √n = 17 (cf. Devroye et al., 2013). We perform an exponential search for the best parameter k ∈ {2i \| i ∈ [8]} for the k-DTW distance, and a subsequent finer search. The best results were obtained for k ∈ {64, 76}, which amounts to roughly 20% − 25% of the curves complexity.