reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Population Aware Diffusion for Time Series Generation

Authors: Yang Li, Han Meng, Zhenyu Bi, Ingolv T. Urnes, Haipeng Chen

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results in major benchmark datasets show that Pa D-TS can improve the average CC distribution shift score between real and synthetic data by 5.9x while maintaining a performance comparable to stateof-the-art models on individual-level authenticity. In this section, we describe our experiment settings and evaluate the TS generation quality of Pa D-TS across different domains and sequence lengths. The experiment results consist of quantitive and qualitative results in terms of individual authenticity and population-level property preservation. We also perform an ablation study to demonstrate the effectiveness of each proposed component and the effect of the hyperparameter α.
Researcher Affiliation	Collaboration	Yang Li1,, Han Meng1, Zhenyu Bi2, Ingolv T. Urnes 3, Haipeng Chen 1 1William & Mary 2Virginia Tech 3Generated Health 1 EMAIL, 2 EMAIL,3 EMAIL
Pseudocode	Yes	Algorithm 1: Pa D-TS training procedure Input: Original TS data, FD function f, epochs E, and total diffusion steps T Output: Trained Pa D-TS model θ 1: for i = 1 to E do 2: Sample a mini-batch of x0 with b samples 3: Sample t1 [1, T 1] SSS 4: Let t = [t1, ..., t1] 5: Get ˆx0 using Pa D-TS model PAT objective 6: Find all FD distributions for ˆx0 and x0 7: Calculate L0 and Lpop 8: Update θ with gradient θ(L0 + Lpop) 9: end for 10: return Model θ
Open Source Code	Yes	Code https://github.com/wmd3i/Pa D-TS
Open Datasets	Yes	Datasets: We use three major benchmark datasets, spanning domains such as physics, finance, and synthetic time series. (1) Sines (Yoon, Jarrett, and Van der Schaar 2019): Synthetic sine wave time series data that can be sampled based on parameters. (2) Stocks (Yoon, Jarrett, and Van der Schaar 2019): Google stocks history time series data includes 5 features such as Open, Close, Volume, etc. (3) Energy (Candanedo, Feldheim, and Deramaix 2017): Home appliances energy consumption time series data includes 28 features such as energy consumption, room temperatures, room humidity levels, and more. Additional Mujoco (Tunyasuvunakool et al. 2020) and f MRI (Smith et al. 2011) dataset results are available in Appendix E.
Dataset Splits	No	The paper uses well-known benchmark datasets but does not explicitly state the training, validation, and test split percentages or methodology within the provided text. It mentions using 'training set from original and synthetic datasets' and 'test set' but no specific split ratios or details for reproducing the data partitioning.
Hardware Specification	Yes	All experiments are run on a Rocky Linux server with AMD EPYC 7313 CPU, 128 GB of memory, and 2 Nvidia A40 GPUs.
Software Dependencies	No	The paper mentions 'Additional model hyperparameters are provided in Appendix C,' which might contain software dependencies. However, the main text does not explicitly list any specific software components with their version numbers required to reproduce the experiment.
Experiment Setup	Yes	Additional model hyperparameters are provided in Appendix C. In the second ablation study, we train different Pa D-TS models with different α values ranging from 0 to 0.05. TS generation with sequence length 24.