reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models

Authors: Juan Lopez Alcaraz, Nils Strodthoff

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that SSSD matches or even exceeds state-of-the-art probabilistic imputation and forecasting performance on a broad range of data sets and different missingness scenarios, including the challenging blackout-missing scenarios, where prior approaches failed to provide meaningful results. (3) We provide extensive experimental evidence for the superiority of the proposed approach compared to state-of-the-art approaches on different data sets for various missingness approaches, particularly for the most challenging blackout and forecasting scenarios. Throughout this section, we always train and evaluate imputation models on identical missingness scenarios and ratios, e.g., we train on 20% RM and evaluate based on the same setting.
Researcher Affiliation	Academia	Juan Miguel Lopez Alcaraz EMAIL Division AI4Health Oldenburg University Nils Strodthoff EMAIL Division AI4Health Oldenburg University
Pseudocode	Yes	As described in Section 2.2, we only apply noise to the portions of the time series to be imputed with training and sampling procedures described explicitly along with pseudocode in Appendix A.4. Algorithm 1 Training step. Algorithm 2 Sampling algorithm.
Open Source Code	Yes	The source code underlying our experiments is available under https://github.com/AI4Health UOL/SSSD.
Open Datasets	Yes	As first data set, we consider ECG data from the PTB-XL data set (Wagner et al., 2020a;b; Goldberger et al., 2000). To demonstrate that the excellent performance of SSSDS4 extends to further data sets, we collected the Mu Jo Co data set (Rubanova et al., 2019) from Shan et al. (2021). We implemented the RM imputation task on the Electricity data set (Dua & Graff, 2017). We test on the Solar data set collected from Gluon TS (Alexandrov et al., 2020). To this end, we collected the preprocessed ETTm1 data set from Zhou et al. (2021).
Dataset Splits	Yes	The PTB-XL ECG data set consists of 21837 clinical 12-lead ECGs, each lasting 10 seconds, from 18885 patients. For the three imputation and forecasting scenarios, we utilize 20% as target values. For the 248 time steps setting, the data set was preprocessed on crops, which corresponds to 69,764 training and 8,812 test samples. By convention, there is a 80/20 random split for training and testing [Mu Jo Co data set]. The task is to condition on 168 time steps to forecast the following 24. The whole data set contains 73 samples of 192 time steps and 128 features each in a chronological manner, we use the first 65 as training set and the remaining 8 as the set [Solar data set]. we forecast in each of the benchmarking horizons, utilizing train and test sets, coming from the original split of train/val/test set, which was for 12/4/4 months, respectively [ETTm1 data set].
Hardware Specification	Yes	The training on the models was performed on single NVIDIA A30 cards.
Software Dependencies	No	The paper mentions optimizers like 'Adam' and loss functions like 'MSE' but does not specify versions for any programming languages or libraries (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	Table 12: CSDI hyperparameters. Hyperparameter Value Residual layers 4 Residual channels 64 Diffusion embedding dim. 128 Schedule Quadratic Diffusion steps T 50 B0 0.0001 B1 0.5 Feature embedding dim. 128 Time embedding dim. 16 Self-attention layers time dim. 1 Self-attention heads time dim. 8 Self-attention layers feature dim. 1 Self-attention heads feature dim. 8 Optimizer Adam Loss function MSE Learning rate 1 10 3 Weight decay 1 10 6. Similar tables are provided for Diff Wave, SSSDSA, SSSDS4, and S4 models in Appendix C.