reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Tslearn, A Machine Learning Toolkit for Time Series Data

Authors: Romain Tavenard, Johann Faouzi, Gilles Vandewiele, Felix Divo, Guillaume Androz, Chester Holtz, Marie Payne, Roman Yurchak, Marc Rußwurm, Kushal Kolar, Eli Woods

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The importance of providing time-series speciﬁc methods for machine learning is illustrated in the example below and the corresponding Figure 1, where standard Euclidean k-means fails while DTW-based ones (Sakoe and Chiba, 1978; Petitjean et al., 2011; Cuturi and Blondel, 2017) can distinguish between diﬀerent time series proﬁles: from tslearn.clustering import Time Series KMeans from tslearn.datasets import Cached Datasets # Load the Trace data set X_train = Cached Datasets().load_dataset( Trace )[0] # Define parameters for each metric euclidean_params = { metric : euclidean } dba_params = { metric : dtw } sdtw_params = { metric : softdtw , metric_params : { gamma : .01}} # Perform clustering for each metric y_preds = [] for params in (euclidean_params, dba_params, sdtw_params): km = Time Series KMeans(n_clusters=3, random_state=0, **params) y_preds.append(km.fit_predict(X_train))
Researcher Affiliation	Collaboration	Romain Tavenard EMAIL Universit e de Rennes, CNRS, LETG-Rennes, IRISA-Obelix, Rennes, France Johann Faouzi EMAIL Aramis Lab, INRIA Paris, Paris Brain Institute, Paris, France Gilles Vandewiele EMAIL IDLab, Ghent University imec, Ghent, Belgium Felix Divo EMAIL Technische Universit at Darmstadt, Darmstadt, Germany Guillaume Androz EMAIL Icentia Inc., Qu ebec, Canada Chester Holtz EMAIL Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA Marie Payne EMAIL Mc Gill University, Montreal, Qu ebec, Canada Roman Yurchak EMAIL Symerio, Paris, France Marc Rußwurm EMAIL Technical University of Munich, Chair of Remote Sensing Technology, Munich, Germany Kushal Kolar EMAIL Sars International Centre for Marine Molecular Biology, University of Bergen, Norway Eli Woods EMAIL Eaze Technologies, Inc., San Francisco, CA, USA
Pseudocode	No	The paper provides code snippets demonstrating the usage of the tslearn library's API, such as importing modules and fitting models, but it does not include structured pseudocode or algorithm blocks describing the underlying methods or procedures.
Open Source Code	Yes	tslearn is a general-purpose Python machine learning library for time series that oﬀers tools for pre-processing and feature extraction as well as dedicated models for clustering, classiﬁcation and regression. It follows scikit-learn s Application Programming Interface for transformers and estimators, allowing the use of standard pipelines and model selection tools on top of tslearn objects. It is distributed under the BSD-2-Clause license, and its source code is available at https://github.com/tslearn-team/tslearn.
Open Datasets	Yes	from tslearn.datasets import Cached Datasets # Load the Trace data set X_train = Cached Datasets().load_dataset( Trace )[0]
Dataset Splits	Yes	from sklearn.model_selection import KFold, Grid Search CV from tslearn.neighbors import KNeighbors Time Series Classifier knn = KNeighbors Time Series Classifier(metric="dtw") p_grid = {"n_neighbors": [1, 5]} cv = KFold(n_splits=2, shuffle=True, random_state=0) clf = Grid Search CV(estimator=knn, param_grid=p_grid, cv=cv) clf.fit(X, y)
Hardware Specification	No	The paper does not provide specific details regarding the hardware used for running experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	Yes	tslearn v0.3.1 is a cross-platform software package for Python 3.5+. It depends on numpy (Van Der Walt et al., 2011) & scipy (Virtanen et al., 2020) packages for basic array manipulations and standard linear algebra routines and on scikit-learn (Pedregosa et al., 2011) for its API and utilities. It also utilizes Cython (Behnel et al., 2011), numba (Lam et al., 2015) and joblib (Varoquaux et al., 2010) for eﬃcient computation. Finally, keras (Chollet et al., 2015) with tensorflow (Abadi et al., 2016) backend is an optional dependency that is necessary to use the shapelets module in tslearn that provides an eﬃcient implementation of the shapelet model by Grabocka et al. (2014).
Experiment Setup	Yes	from sklearn.model_selection import KFold, Grid Search CV from tslearn.neighbors import KNeighbors Time Series Classifier knn = KNeighbors Time Series Classifier(metric="dtw") p_grid = {"n_neighbors": [1, 5]} cv = KFold(n_splits=2, shuffle=True, random_state=0) clf = Grid Search CV(estimator=knn, param_grid=p_grid, cv=cv) clf.fit(X, y)