reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

TimeInf: Time Series Data Contribution via Influence Functions

Authors: Yizi Zhang, Jingyan Shen, Xiaoxue Xiong, Yongchan Kwon

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate its effectiveness in time series anomaly detection. On multiple real-world datasets, our method outperforms existing approaches in detecting harmful anomalies. Time Inf provides intuitive data value attributions, facilitating the identification of harmful anomalies and beneficial temporal patterns through visualizations of the computed influence scores. We present a potentially important application of Time Inf in identifying mislabeled anomalies in the ground truth annotations.
Researcher Affiliation	Academia	Yizi Zhang1 Jingyan Shen1 Xiaoxue Xiong1 Yongchan Kwon1 1Columbia University
Pseudocode	Yes	Algorithm 1 Time Inf Estimator Require: Time point of interest z, training time blocks {x[m] 1 , . . . , x[m] n }, test time block z[m] test , index set of time blocks that contain the time point of interest Sz, model parameters ˆθ, loss function l, and block length m. Ensure: Time Inf Itime(z, z[m] test ) (Optional if ˆθ is ready) Learn the parameter ˆθ by training a model on {x[m] 1 , . . . , x[m] n }. 1. Compute the gradients at the training time block ψ(x[m] i , ˆθ) = d dθl(x[m] i , ˆθ) for all i {1, . . . , n} and the Hessian Hˆθ. 2. Compute the gradient at the test time block ψ(z[m] test , ˆθ) = d dθl(z[m] test , ˆθ) for i Sz do Itime(z, z[m] test ) = 1 \|Sz\| ψ(z[m] test , ˆθ) H 1 ˆθ ψ(x[m] i , ˆθ) + Itime(z, z[m] test ) . end for
Open Source Code	No	The paper does not provide an explicit statement about the release of source code or a link to a code repository.
Open Datasets	Yes	We extensively evaluate Time Inf on five benchmark datasets: UCR (Wu & Keogh, 2021), SMAP (Hundman et al., 2018), MSL, NAB-Traffic, and SMD (Su et al., 2019). The Traffic dataset4 aggregates hourly road occupancy rates over 48 months from 862 sensors across California. The Solar Energy dataset5 comprises records from 137 photovoltaic power plants in Alabama State. The Electricity dataset includes the electricity consumption patterns of 321 clients from 2012 to 2014 (Trindade, 2015). The Exchange Rate dataset includes daily exchange rates for 8 countries, covering the period from 1990 to 2016 (Lai et al., 2018). 4https://pems.dot.ca.gov/ 5https://www.nrel.gov/grid/solar-power-data.html Trindade. Electricity Load Diagrams20112014. UCI Machine Learning Repository, 2015. DOI: https://doi.org/10.24432/C58C86.
Dataset Splits	Yes	For each dataset, we conduct 100 independent experiments, each time randomly sampling 5,000 consecutive data points. These points are then sequentially partitioned into a training set, a validation set, and a test set, with fixed sizes of 3000, 1000, and 1000 points, respectively.
Hardware Specification	No	We acknowledge computing resources from Columbia University s Shared Research Computing Facility project, which is supported by NIH Research Facility Improvement Grant 1G20RR03089301, and associated funds from the New York State Empire State Development, Division of Science Technology and Innovation (NYSTAR) Contract C090171, both awarded April 15, 2010. The paper mentions computing resources but does not specify exact hardware models (e.g., GPU, CPU models).
Software Dependencies	No	The paper mentions various models and algorithms used (e.g., RNN, LSTM, Patch TST, gradient boosting regressor, AR model, ARIMA, VAR), and also mentions "automatic differentiation packages," but it does not provide specific version numbers for any software libraries, frameworks, or tools.
Experiment Setup	Yes	For AR-based models, we use ARIMA for univariate time series, and VAR for multivariate time series. We use time blocks of length 100, as empirical results show optimal model performance at this length with minimal gains beyond; see Appendix Section D. Following the practice (Jiang et al., 2023), we apply the k-Means clustering algorithm to the calculated anomaly scores in our method, using a cluster of two to partition the scores into clusters of normal and anomaly time points. For RNN and LSTM, we apply the CG used in Koh & Liang (2017) to compute Time Inf for differentiable black-box models. For the large nonlinear Patch TST model, we use the scalable Hessian-free (Pruthi et al., 2020) method to compute Time Inf. For gradient boosting, we compute nonparametric Time Inf according to Feldman & Zhang (2020); see details in Appendix Section B. All models use MSE as the loss function. Experiments on Different Block Lengths. The performance of Time Inf in anomaly detection can be affected by the block length, which determines the order of the AR model used in its computation. Experiments on Time Stride. The time stride, which determines the number of time blocks a time point appears in, affects the anomaly detection performance of Time Inf. Table 3 shows that as the time stride increases, the AUC decreases, with the threshold varying across datasets. Larger strides provide fewer time blocks containing the point, limiting the sampling of temporal configurations around it. Consequently, with fewer samples, Time Inf estimate becomes less reliable, which is why we choose stride 1 in our experiments.