reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

S4M: S4 for multivariate time series forecasting with Missing values

Authors: Jing Peng, Meiqi Yang, Qiong Zhang, Xiaoxiao Li

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive empirical evaluations on diverse real-world datasets, we demonstrate that S4M consistently achieves state-of-the-art performance. These results underscore the efficacy of our integrated approach in handling missing data, showcasing its robustness and superiority over traditional imputation-based methods. Our findings highlight the potential of S4M to advance reliable time series forecasting in practical applications, offering a promising direction for future research and deployment.
Researcher Affiliation	Academia	Jing Peng1 Meiqi Yang2 Qiong Zhang1 Xiaoxiao Li3,4 1Renmin University of China 2Princeton University 3The University of British Columbia 4Vector Institute EMAIL, EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 Bank Reading, Algorithm 2 Bank Writing, Algorithm 3 Testing Pipeline, Algorithm 4 Training Pipeline
Open Source Code	Yes	Code is available at https://github.com/WINTERWEEL/S4M.git.
Open Datasets	Yes	We select four commonly used time series datasets for forecasting: Electricity (Wu et al., 2021), ETTh1 (Zhou et al., 2021), Traffic (Wu et al., 2021), and Weather (Wu et al., 2021). For more general evaluation, we also include the real world dataset, USHCN climate dataset (Menne et al., 2015), with 271728 time steps and 10 variables in total.
Dataset Splits	Yes	After obtaining the dataset with missing values, we split it chronologically into training, validation, and test sets, with a ratio of 0.7/0.1/0.2.
Hardware Specification	No	To measure the training and inference time, we conducted performance experiments using the electricity dataset, with a batch size of 16 and a hidden size of 512. The maximum memory usage, along with the training and inference times, were recorded for a single epoch. (This text does not contain specific hardware details like GPU/CPU models, only computational metrics).
Software Dependencies	No	The learning rates are set to 0.01 for the Electricity and Traffic datasets, 0.005 for the ETTh1 dataset, and 0.001 for the Weather dataset. The dimensions of the hidden layers are set to 512 for the Electricity and Traffic datasets, and 256 for the ETTh1 and Weather datasets. The number of basic blocks or layers is selected from {2, 4, 8}. The batch size set for all experiments are 16. We use the Adam optimizer and implement an early stopping strategy across all experiments. (This text mentions the Adam optimizer but does not provide specific version numbers for software libraries or frameworks used, like Python, PyTorch, TensorFlow, etc.)
Experiment Setup	Yes	The learning rates are set to 0.01 for the Electricity and Traffic datasets, 0.005 for the ETTh1 dataset, and 0.001 for the Weather dataset. The dimensions of the hidden layers are set to 512 for the Electricity and Traffic datasets, and 256 for the ETTh1 and Weather datasets. The number of basic blocks or layers is selected from {2, 4, 8}. The batch size set for all experiments are 16. We use the Adam optimizer and implement an early stopping strategy across all experiments.