reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Battling the Non-stationarity in Time Series Forecasting via Test-time Adaptation

Authors: HyunGi Kim, Siwon Kim, Jisoo Mok, Sungroh Yoon

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on diverse benchmark datasets and cutting-edge architectures demonstrate the efficacy and generality of TAFAS, especially in long-term forecasting scenarios that suffer from significant distribution shifts.
Researcher Affiliation	Academia	Hyun Gi Kim1, Siwon Kim1, Jisoo Mok1, Sungroh Yoon1, 2, 3 1Department of Electrical and Computer Engineering, Seoul National University 2Interdisciplinary Program in Artificial Intelligence, Seoul National University 3AIIS, ASRI, and INMC, Seoul National University EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology in detail using text and mathematical equations, and refers to an appendix for a summary of the overall pipeline, but it does not contain a clearly labeled pseudocode or algorithm block in the main text.
Open Source Code	Yes	Code https://github.com/kimanki/TAFAS
Open Datasets	Yes	We demonstrate the effectiveness of TAFAS using the seven widely used multivariate TSF benchmark datasets: ETTh1, ETTm1, ETTh2, ETTm2, Exchange, Illness, and Weather (Wu et al. 2021).
Dataset Splits	Yes	We split datasets in chronological order with the ratio of (0.6, 0.2, 0.2) for ETTh1, ETTm1, ETTh2, and ETTm2 and (0.7, 0.1, 0.2) for Exchange, Illness, and Weather to construct train, validation, and test sets.
Hardware Specification	No	The paper does not provide specific details on the hardware used for experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions various TSF architectures used (e.g., i Transformer, DLinear, Fre TS) and normalization modules (Rev IN, Dish-TS, SAN), but it does not specify any software libraries or packages with their version numbers.
Experiment Setup	Yes	We use the look-back window length L = 36 for Illness and L = 96 for the other datasets. For forecasting window length H, we evaluate on 4 different lengths, H {24, 36, 48, 60} for Illness and H {96, 192, 336, 720} for the other datasets. We repeat each pre-training run over three different seeds and select the pre-trained source forecaster with the lowest average validation MSE. More details on training processes are provided in Appendix.