Self-attention-based Diffusion Model for Time-series Imputation in Partial Blackout Scenarios

Authors: Mohammad Rafid Ul Islam, Prasad Tadepalli, Alan Fern

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on benchmark and two real-world time series datasets demonstrate that our model outperforms the state-of-the-art in partial blackout scenarios and shows better scalability. ... We evaluate our model in several realworld time series domains with randomly missing training data and show that it outperforms and scales better than the other state-of-the-art models under partial blackout scenarios. ... Ablation Study Our model, SADI, has three core features: (1) the FDE (feature dependency encoder) block that models feature intercorrelations, (2) the two-stage imputation process, and (3) the weighted combination of the two intermediate imputations. Now, we will do an ablation study to show the impact of these three design decisions.
Researcher Affiliation Academia Mohammad Rafid Ul Islam, Prasad Tadepalli, Alan Fern Oregon State University EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Training of our diffusion model Input: Distribution of training data X0 q(X0), the number of iteration/epochs E, the list of noise levels ( α1, . . . , αT ) Output: Denoising function ϵθ ... Algorithm 2: Sampling process Input: Data sample X0, missingness mask M co 0 , total number of diffusion steps T, trained denoising function ϵθ Output Imputed missing values Xta 0
Open Source Code No The paper does not explicitly state that the authors are releasing their own code for the methodology described. It mentions a GitHub repository for CSDI, a different model, but not for SADI.
Open Datasets Yes The first dataset is a grape cultivar cold hardiness dataset from Ag AID, which measures grape plant characteristics such as its resistance to cold weather along with a number of environmental factors at regular intervals1. ... Air Quality is a popular dataset considered in (Yi et al. 2016) among others. ... Another widely known dataset is the Electricity Load Diagram from the public UCI machine learning repository (Dua and Graff 2017). ... The last dataset examined in this study consists of temperature data sourced from the Northwest Alliance for Computational Science & Engineering (NACSE) PRISM climate data2.
Dataset Splits Yes It spans 34 seasons (1988 to 2022), with the last 2 seasons set aside for testing. ... There are 48 months worth of data. We designate the first 10 months of data as the test set, the subsequent 10 months as the validation set, and the remaining data as the training set, the same as (Du, Cˆot e, and Liu 2023). ... For our experimental setup, we reserve the last 2 years for testing.
Hardware Specification No In these experiments, we have observed that CSDI (code taken from one of the author s Git Hub repository3) requires a huge amount of GPU memory when dealing with high dimensional data such as Electricity and NACSE datasets. For these two datasets, we had to reduce the number of channels to 8 because of our GPU constraints, which may have had some negative effect on its performance shown in Tables 1 and 2. ... Moreover, SADI requires less GPU memory than CSDI for training and inference, making it suitable for diverse applications. While running CSDI on large datasets like Electricity and NACSE required reducing the number of channels due to memory constraints, SADI handled the same capacity without such adjustments. The paper mentions 'GPU memory' and 'GPU constraints' but does not specify specific GPU models or other detailed hardware specifications like CPU or RAM.
Software Dependencies No The paper does not explicitly mention any specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x).
Experiment Setup No For each test trial, we uniformly select which features are missing and select two blocks to be missing for 10 (in the case of the Air Quality dataset) or 30 (in the case of other datasets) consecutive time steps. For CSDI and SADI, we generated 50 predicted samples to approximate the probability distribution of the missing data. We used the mean for SADI and the median for CSDI (as proposed in their paper) for the final prediction. ... The number of FDE layers is controlled by the hyperparameter NF DE. The GTA comprises multiple layers, controlled by the hyperparameter NGTA. The paper describes aspects of the experimental *procedure* (e.g., how missingness is generated, number of inference samples) and mentions hyperparameters are controlled, but does not provide concrete numerical values for essential training hyperparameters such as learning rate, batch size, number of epochs, optimizer settings, or the specific values for NFDE and NGTA.