reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MONSTER: Monash Scalable Time Series Evaluation Repository

Authors: Angus Dempster, Navid Mohammadi Foumani, Chang Wei Tan, Lynn Miller, Amish Mishra, Mahsa Salehi, Charlotte Pelletier, Daniel F. Schmidt, Geoffrey I. Webb

DMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Section 4 provides preliminary baseline results for selected methods. In particular, we provide results for four deep learning models: Conv Tran (Foumani et al., 2024b), FCN (Wang et al., 2017), HInception Time (Ismail-fawaz et al., 2022), and Temp CNN (Pelletier et al., 2019). We include results for two more traditional , specialised methods for time series classification: Hydra (Dempster et al., 2023), and Quant (Dempster et al., 2024a). We also include results for a standard, off the shelf classifier extremely randomised trees (Geurts et al., 2006) to act as a na ıve baseline. We provide results for 0 1 loss, log loss, weighted F1 score, balanced accuracy, and training time. Each method is evaluated on each dataset using 5-fold cross-validation, using predefined cross-validation folds.
Researcher Affiliation	Academia	Monash University, Melbourne, Australia Universit e Bretagne Sud, IRISA, Vannes, France
Pseudocode	No	The paper describes various methods (FCN, Temp CNN, H-Inception Time, Conv Tran, Hydra, Quant, and Extremely Randomised Trees) in textual paragraphs but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Relevant code is available at: https://github.com/Navidfoumani/monster.
Open Datasets	Yes	The datasets are available via Hugging Face: https://huggingface.co/monster-monash. Additional information in relation to hosting is provided in Appendix B. Relevant code is available at: https://github.com/Navidfoumani/monster. We provide the datasets in .npy format to allow for ease of use with Python and straightforward memory mapping. (We also provide the datasets in legacy .csv format.) All datasets are under creative commons licenses or in the public domain, or we otherwise have been given permission to include the dataset in this collection. All datasets are already publicly available in some form.
Dataset Splits	Yes	Each dataset is provided with a set of indices for 5-fold cross-validation, allowing for direct comparison between benchmark results. For some datasets, these simply represent stratified random cross-validation folds. For other datasets, the cross-validation folds have been generated taking into account important metadata, e.g., different experimental subjects (for EEG data), or different geographic locations (for satellite image time series data).
Hardware Specification	Yes	The five methods trained using GPUs were each trained using a single GPU, either an Ampere A100 SMX4 with 80GB RAM, or an Ampere A40 with 48GB RAM.
Software Dependencies	No	The paper mentions the Adam optimiser and the Torcheeg toolkit, but does not provide specific version numbers for any software, libraries, or programming languages used in their experiments. For example, it mentions 'Python' but without a version.
Experiment Setup	Yes	The four deep learning models are trained using the Adam optimiser (Kingma and Ba, 2015) and a batch size of 256 for a maximum of 100 epochs. The one exception is HInception Time with the Audio MNIST dataset, which used a batch size of 64 to enable it to fit in the GPU memory. For all datasets, we implement early stopping and select the best epoch found as the final model, using a validation set obtained by randomly selecting 10% of the training dataset. Training time on each fold is limited to approximately 24 hours or one epoch, whichever is longer.