reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

General Incomplete Time Series Analysis via Patch Dropping Without Imputation

Authors: Yangyang Wu, Yi Yuan, Mengying Zhu, Xiaoye Miao, Meng Xi

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on 11 public real-world time series datasets demonstrate that INTER improves accuracy by over 20% compared to state-of-the-art methods, while maintaining competitive computational efficiency. 5 Experiment In this section, we evaluate the performance of our proposed model INTER on five tasks long-term forecasting, short-term forecasting, imputation, classification, and anomaly detection using 11 (in)complete multivariate time series datasets.
Researcher Affiliation	Academia	Yangyang Wu1,4 , Yi Yuan1 , Mengyin Zhu1 , Xiaoye Miao2,3 and Meng Xi1,4 1School of Software Technology, Zhejiang University 2Center for Data Science, Zhejiang University 3The State Key Lab of Brain-Machine Intelligence, Zhejiang University 4Binjiang Institute of Zhejiang University EMAIL
Pseudocode	Yes	The pseudo-code of INTER can be found in Appendix B.
Open Source Code	No	The paper does not explicitly state that the source code for the methodology is openly available, nor does it provide a direct link to a code repository.
Open Datasets	Yes	Extensive experiments on 11 public real-world time series datasets demonstrate that INTER improves accuracy by over 20% compared to state-of-the-art methods, while maintaining competitive computational efficiency. For the long-term setting, we use four widely-used public multivariate time series datasets: Electricity [Gasparin et al., 2022], Weather [Zhou et al., 2021], Exchange [Zhang and Berardi, 2001], and Illness [Zhou et al., 2021], covering four real-world scenarios. For the short-term setting, we adopt the M4 [Makridakis, 2018] dataset and its representative subsets, including yearly and monthly collected univariate marketing data. We use three public real-world time series datasets: two representative multivariate datasets from the UEA Time Series Classification Archive [Bagnall et al., 2018] (i.e., Heartbeat and Japanese Vowels), and one public medical dataset (i.e., Physionet) [Goldberger et al., 2000].
Dataset Splits	No	The paper mentions that for forecasting models, "we randomly remove 50% of the observed values prior to model training." It also refers to using established datasets like M4 and UEA, and preprocessing "following the descriptions in [Zerveas et al., 2021]". However, it does not explicitly provide specific percentages, sample counts, or direct methodologies for the standard training, validation, and test splits within the main text for reproduction.
Hardware Specification	Yes	The experiments were conducted on a server with an Intel Core 2.80GHz processor, 3 NVIDIA A40 GPUs, and 192GB RAM, running Ubuntu 18.04.
Software Dependencies	No	The paper states: "All approaches were implemented in Python." but does not specify a Python version or any other software libraries with version numbers.
Experiment Setup	Yes	To evaluate the effectiveness of forecasting models on incomplete multivariate time series datasets, we randomly remove 50% of the observed values prior to model training. Each metric value is obtained by averaging the results of five experimental runs on each dataset.