Learning Soft Sparse Shapes for Efficient Time-Series Classification
Authors: Zhen Liu, Yicheng Luo, Boyuan Li, Emadeldeen Eldele, Min Wu, Qianli Ma
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the performance of various methods for TSC, we conduct experiments using the UCR time series archive (Dau et al., 2019), a widely recognized benchmark in TSC (Ismail Fawaz et al., 2019). Many datasets in the UCR archive contain a significantly higher number of test samples compared to the training samples. Also, the original UCR time series datasets lack a specific validation set, increasing the risk of overfitting in deep learning methods. Following (Dau et al., 2019; Ma et al., 2024), we merge the original training and test sets of each UCR dataset. These merged datasets are then divided into train-validation-test sets at a ratio of 60%-20%-20%. This paper compares Soft Shape against 19 baseline methods, categorized as follows: |
| Researcher Affiliation | Academia | 1 School of Computer Science and Engineering, South China University of Technology, Guangzhou, China 2Institute for Infocomm Research, Agency for Science, Technology and Research, Singapore. Correspondence to: Qianli Ma <EMAIL>, Min Wu <EMAIL>. |
| Pseudocode | Yes | Additionally, the pseudo-code for Soft Shape can be found in Algorithm 1 located in the Appendix. |
| Open Source Code | Yes | Our source code is available at https://github. com/qianlima-lab/Soft Shape. |
| Open Datasets | Yes | To evaluate the performance of various methods for TSC, we conduct experiments using the UCR time series archive (Dau et al., 2019), a widely recognized benchmark in TSC (Ismail Fawaz et al., 2019). |
| Dataset Splits | Yes | Following (Dau et al., 2019; Ma et al., 2024), we merge the original training and test sets of each UCR dataset. These merged datasets are then divided into train-validation-test sets at a ratio of 60%-20%-20%. This paper compares Soft Shape against 19 baseline methods, categorized as follows: Following the experimental settings outlined in (Ma et al., 2024) and the recommendations of (Dau et al., 2019), we integrate the raw training and test sets and employ a five-fold cross-validation strategy to partition the dataset into training, validation, and test sets in a 3:1:1 ratio. Consistent with the approach in (Ma et al., 2024), each fold is sequentially designated as the test set, while the remaining four folds are randomly divided into training and validation sets in a 3:1 ratio. |
| Hardware Specification | Yes | Each experiment is conducted five times with five different random seeds, utilizing the Py Torch 1.12.1 platform and four NVIDIA Ge Force RTX 3090 GPUs. |
| Software Dependencies | Yes | Each experiment is conducted five times with five different random seeds, utilizing the Py Torch 1.12.1 platform and four NVIDIA Ge Force RTX 3090 GPUs. |
| Experiment Setup | Yes | We employ the Adam optimizer with a learning rate of 0.001 and a maximum training epoch of 500. Also, for all baseline method training, we implement a consistent early stopping strategy based on validation set loss values (Ma et al., 2024). For all UCR datasets, we apply a uniform normalization strategy to standardize each time series within the dataset (Ismail Fawaz et al., 2019). For datasets containing missing values, we use the mean of observed values at each specific timestamp across all samples in the training set to impute missing values at corresponding timestamps within the time series samples (Ma et al., 2024). Following (Wang et al., 2017), we set the batch size for model training as min(xtrain.shape[0]/10, 16), where xtrain.shape[0] represents the number of time series samples in the training set, and min( ) denotes the function that selects the minimum value. Additionally, we set the model depth L to 2. The number of activated experts k used for the Mixture of Experts (Mo E) router in Equation (6) is set to 1, while the total number of experts ˆC in Mo E is defined as the total number of classes in each dataset. The parameter λ, which regulates the learning progression of the loss term in Equation (15), is set to 0.001. The sliding window size q for extracting subsequences is set to 4. The dattn in Equation (3) is set to 8 (Early et al., 2024). |