FreqLLM: Frequency-Aware Large Language Models for Time Series Forecasting
Authors: Shunnan Wang, Min Gao, Zongwei Wang, Yibing Bai, Feng Jiang, Guansong Pang
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on benchmark datasets demonstrate that Freq LLM outperforms state-of-the-art TSF methods in both accuracy and generalization. |
| Researcher Affiliation | Academia | 1Key Laboratory of Dependable Service Computing in Cyber Physical Society (Chongqing University), Ministry of Education 2School of Big Data and Software Engineering, Chongqing University 3School of Computing and Information Systems, Singapore Management University |
| Pseudocode | No | The paper describes the methodology in text and through diagrams (Figure 2), but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/biya0105/Freq LLM. |
| Open Datasets | Yes | For the long-term forecasting experiments, we test using a variety of datasets, including the Electricity Transformer Temperature (ETT) dataset [Zhou et al., 2021], as well as weather and traffic datasets [Wu et al., 2023], which are widely used for evaluating the long-term forecasting performance of time series models. For short-term experiments, we primarily utilize the M4 benchmark dataset [Makridakis et al., 2018], which consists of time series data from annual, quarterly, monthly, and other categories, featuring large scale, wide coverage, and high-quality data. |
| Dataset Splits | Yes | We used a unified pipeline following the experimental configurations of all baselines [Wu et al., 2023]. In these experiments, we use the top 5% and 10% of the training data. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for running experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using GPT-2 as the backbone model and the Adam optimizer, but does not specify version numbers for these or other software libraries (e.g., Python, PyTorch, CUDA). |
| Experiment Setup | Yes | Our method is trained with MSE loss, using the Adam [Kinga et al., 2015] optimizer with an initial learning rate of 10 2. We maintain the backbone model at 32 layers. We set the patch dimension dm to 16, the number of heads M to 8, the semantic exemplars size V to 1000, the loss weight ΜΈ to 0.08, the sliding window size to 8, and the prompt length K to 8. |