reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Context-Alignment: Activating and Enhancing LLMs Capabilities in Time Series

Authors: Yuxiao Hu, Qian Li, Dongxiao Zhang, Jinyue Yan, Yuntian Chen

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show the effectiveness of FSCA and the importance of Context-Alignment across tasks, particularly in few-shot and zero-shot forecasting, confirming that Context-Alignment provides powerful prior knowledge on context. Extensive experiments across various TS tasks demonstrate the effectiveness of our method. Notably, in few-shot and zero-shot forecasting tasks, our approach significantly outperforms others, confirming that the logical and structural alignment provides powerful prior knowledge on context. Ablation studies further validate the importance of Context-Alignment. (See also section 4 Experiments and its subsections 4.1-4.7, tables 1-6 and figures 2-3 with detailed results and comparisons).
Researcher Affiliation	Academia	1The Hong Kong Polytechnic University, Hong Kong, China 2Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China 3Shanghai Jiao Tong University, Shanghai, China 4Zhejiang Key Laboratory of Industrial Intelligence and Digital Twin, Eastern Institute of Technology, Ningbo, China. Email addresses: EMAIL, EMAIL, EMAIL, EMAIL, EMAIL.
Pseudocode	No	The paper describes its methodology through detailed text, mathematical formulations (equations 1-14), and an architecture diagram (Figure 1), but it does not include explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code is open-sourced at https://github.com/tokaka22/ICLR25-FSCA.
Open Datasets	Yes	For the long-term forecasting task, we utilized eight widely used multivariate datasets (Wu et al., 2022), as detailed in Table 7. These include the Electricity Transformer Temperature (ETT) datasets (Zhou et al., 2021), as well as Illness, Weather, Electricity, and Traffic datasets. For the short-term forecasting task, we employed the M4 benchmark dataset (Makridakis et al., 2018)... For the time series classification task, we utilized 10 multivariate UEA datasets from Bagnall et al. (2018).
Dataset Splits	Yes	For the time series classification task... Table 9 summarizes the number of classes, series lengths, feature dimensions, and sample sizes for training and testing. (Table 9 explicitly provides 'Train Cases' and 'Test Cases' values). We evaluate the performance of FSCA using only 5% of training data, with results from 10% data detailed in Appendix C.3. For zero-shot forecasting... the model is trained on Dataset A and then tested on Dataset B without utilizing any training data from Dataset B.
Hardware Specification	Yes	All deep learning networks are implemented in Py Torch and trained on NVIDIA H800 80GB GPUs and Ge Force RTX 4090 GPUs.
Software Dependencies	No	All deep learning networks are implemented in Py Torch... (No specific version numbers for PyTorch or other libraries are provided).
Experiment Setup	Yes	As discussed in the ablation study in Sec. 4.7 (The Number of LLM Layers), we adopt the first 4 layers of GPT-2. For predictive tasks, we utilize FSCA with N = 2 in forms 5 and 6, providing one demonstration example. The Adam optimizer is used with decay rates β = (0.9, 0.999) and initial learning rates from {10^−4, 5 * 10^−4}. We implement a cosine annealing schedule with Tmax = 20 and ηmin = 10^−8, and set the batch size to 256. Early stopping is configured throughout the training process. MSE loss is employed for long-term forecasting, while SMAPE loss is used for short-term predictions. In classification tasks... The RAdam optimizer with initial learning rates from {10^−2, 10^−3} and a batch size of 64 is used. Training also incorporates early stopping and employs cross-entropy loss.