reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

T2S: High-resolution Time Series Generation with Text-to-Series Diffusion Models

Authors: Yunfeng Ge, Jiawei Li, Yiji Zhao, Haomin Wen, Zhao Li, Meikang Qiu, Hongyan Li, Ming Jin, Shirui Pan

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations demonstrate that T2S achieves state-of-the-art performance across 13 datasets spanning 12 domains. [...] We conduct an extensive evaluation across 13 datasets spanning 12 domains to assess the performance of the T2S, aiming to address the following key research questions: RQ1: How does T2S compare in performance to existing state-of-the-art methods given fragment-level captions? [...] 4.2 Performance Comparison on Fragment-Level Descriptions (RQ1) [...] 4.3 Performance Comparison on Point and Instance-Level Descriptions (RQ2) [...] 4.4 Ablation Study (RQ3) [...] 4.5 Parameter Sensitivity (RQ4) [...] 4.6 Data Scarcity (RQ5)
Researcher Affiliation	Academia	Yunfeng Ge1,2, Jiawei Li2,7, Yiji Zhao3, Haomin Wen4, Zhao Li5, Meikang Qiu6, Hongyan Li1, Ming Jin2 and Shirui Pan2 1School of Telecommunications Engineering, Xidian University 2School of Information and Communication Technology, Griffith University 3School of Information Science and Engineering, Yunnan University 4Carnegie Mellon University 5College of Computer Science and Technology, Zhejiang University 6School of Computer and Cyber Sciences, Augusta University 7The Hong Kong University of Science and Technology (Guangzhou)
Pseudocode	Yes	Algorithm 1 Interleaved Training for Mixed Datasets
Open Source Code	Yes	All resources have been made available1. 1https://github.com/Winfred Ge/T2S
Open Datasets	Yes	To address these challenges, we introduce a new fragmentlevel dataset, TSFragment-600K, containing over 600,000 high-resolution fragment-level text-time series pairs, which serves as a foundation for exploring T2S generation. [...] All resources have been made available1. [1https://github.com/Winfred Ge/T2S] [...] Point-Level Dataset: The Time-MMD dataset [Liu et al., 2024a] links individual time series points with corresponding textual news [...] Instance-Level Dataset: SUSHI, a simulated dataset [Kawaguchi et al., 2025], comprises 2,800 samples generated from 15 pre-defined functions.
Dataset Splits	Yes	In this study, we mixed data with lengths of 24, 48, and 96, then trained them in a unified framework. During sampling, arbitrary-length data can be generated within a specified range. [...] For interleaved training, arbitrary lengths of {24, 48, 96} were selected, with evaluations performed separately for each length. [...] For the instance-level dataset, SUSHI is used for training and inference with a fixed length of 2048. For point-level training, arbitrary lengths of {24, 48, 96} were selected for interleaved training.
Hardware Specification	No	The paper does not explicitly mention specific hardware details such as GPU or CPU models, memory, or other computational resources used for running the experiments.
Software Dependencies	No	The paper mentions models like GPT-4o-mini and Llama3.1-8b as baselines or for dataset generation but does not provide specific version numbers for software libraries or environments used to implement T2S itself.
Experiment Setup	Yes	T2S adopts a classifier-free guidance framework, which does not rely on explicit class labels for conditioning. [...] The formula is: uθ (zt, t, C) = (1 + δ)uθ (zt, t, C) δuθ (zt, t) , (4) [...] 4.5 Parameter Sensitivity (RQ4) [...] we explored the sensitivity of the flow matching diffusion model to key inference parameters: classifier-free guidance scales (CFG) and generation time steps,evaluated using MRR@10. Figure 4 shows a heatmap illustrating performance impact, with yellow regions yielding superior results and green areas reflecting suboptimal performance. Notably, the model achieves higher MRR@10 scores within the range of CFG scores between 7 and 10 and generation time steps between 20 and 50.