MONSTER: Monash Scalable Time Series Evaluation Repository
Authors: Angus Dempster, Navid Mohammadi Foumani, Chang Wei Tan, Lynn Miller, Amish Mishra, Mahsa Salehi, Charlotte Pelletier, Daniel F. Schmidt, Geoffrey I. Webb
DMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Section 4 provides preliminary baseline results for selected methods. In particular, we provide results for four deep learning models: Conv Tran (Foumani et al., 2024b), FCN (Wang et al., 2017), HInception Time (Ismail-fawaz et al., 2022), and Temp CNN (Pelletier et al., 2019). We include results for two more traditional , specialised methods for time series classification: Hydra (Dempster et al., 2023), and Quant (Dempster et al., 2024a). We also include results for a standard, off the shelf classifier extremely randomised trees (Geurts et al., 2006) to act as a na ıve baseline. We provide results for 0 1 loss, log loss, weighted F1 score, balanced accuracy, and training time. Each method is evaluated on each dataset using 5-fold cross-validation, using predefined cross-validation folds. |
| Researcher Affiliation | Academia | Monash University, Melbourne, Australia Universit e Bretagne Sud, IRISA, Vannes, France |
| Pseudocode | No | The paper describes various methods (FCN, Temp CNN, H-Inception Time, Conv Tran, Hydra, Quant, and Extremely Randomised Trees) in textual paragraphs but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Relevant code is available at: https://github.com/Navidfoumani/monster. |
| Open Datasets | Yes | The datasets are available via Hugging Face: https://huggingface.co/monster-monash. Additional information in relation to hosting is provided in Appendix B. Relevant code is available at: https://github.com/Navidfoumani/monster. We provide the datasets in .npy format to allow for ease of use with Python and straightforward memory mapping. (We also provide the datasets in legacy .csv format.) All datasets are under creative commons licenses or in the public domain, or we otherwise have been given permission to include the dataset in this collection. All datasets are already publicly available in some form. |
| Dataset Splits | Yes | Each dataset is provided with a set of indices for 5-fold cross-validation, allowing for direct comparison between benchmark results. For some datasets, these simply represent stratified random cross-validation folds. For other datasets, the cross-validation folds have been generated taking into account important metadata, e.g., different experimental subjects (for EEG data), or different geographic locations (for satellite image time series data). |
| Hardware Specification | Yes | The five methods trained using GPUs were each trained using a single GPU, either an Ampere A100 SMX4 with 80GB RAM, or an Ampere A40 with 48GB RAM. |
| Software Dependencies | No | The paper mentions the Adam optimiser and the Torcheeg toolkit, but does not provide specific version numbers for any software, libraries, or programming languages used in their experiments. For example, it mentions 'Python' but without a version. |
| Experiment Setup | Yes | The four deep learning models are trained using the Adam optimiser (Kingma and Ba, 2015) and a batch size of 256 for a maximum of 100 epochs. The one exception is HInception Time with the Audio MNIST dataset, which used a batch size of 64 to enable it to fit in the GPU memory. For all datasets, we implement early stopping and select the best epoch found as the final model, using a validation set obtained by randomly selecting 10% of the training dataset. Training time on each fold is limited to approximately 24 hours or one epoch, whichever is longer. |