reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

LSCD: Lomb–Scargle Conditioned Diffusion for Time series Imputation

Authors: Elizabeth Fons, Alejandro Sztrajman, Yousef El-Laham, Luciana Ferrer, Svitlana Vyetrenko, Manuela Veloso

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on synthetic and real-world benchmarks demonstrate that our method recovers missing data more accurately than purely time-domain baselines, while simultaneously producing consistent frequency estimates. Crucially, our method can be easily integrated into learning frameworks, enabling broader adoption of spectral guidance in machine learning approaches involving incomplete or irregular data.
Researcher Affiliation	Collaboration	1J.P. Morgan AI Research 2University of Cambridge 3University of Buenos Aires 4CONICET. Correspondence to: Elizabeth Fons <EMAIL>.
Pseudocode	Yes	The following listing provides a PyTorch implementation of the Lomb Scargle periodogram, designed to efficiently compute spectral estimates for irregularly sampled time series. This implementation is fully differentiable, allowing seamless integration into learning-based models for gradient-based optimization. 1 import torch ... 53 return P Listing 1: Batch Lomb-Scargle Periodogram with Masking
Open Source Code	Yes	To facilitate broader adoption, we provide a differentiable implementation which can be seamlessly integrated into learning pipelines. ... E.1. Lomb-Scargle implementation The following listing provides a PyTorch implementation of the Lomb Scargle periodogram, designed to efficiently compute spectral estimates for irregularly sampled time series. This implementation is fully differentiable, allowing seamless integration into learning-based models for gradient-based optimization.
Open Datasets	Yes	The first dataset, PhysioNet (Silva et al., 2012), comprises 4,000 health measurements from ICU patients, covering 35 features. ... The second dataset consists of PM2.5 air quality measurements collected from 36 stations in Beijing over a 12-month period (Yi et al., 2016).
Dataset Splits	Yes	For evaluation, we hold out 10%, 50%, and 90% of the observed values as ground truth and assess imputation quality on these missing entries. ... In practice, during training a conditional mask mco {0, 1}K L is introduced to artificially split the observed values into xco 0 = mco X and xta 0 = (M mco) X, in order to train the conditional denoising function.
Hardware Specification	Yes	All computations in this analysis were performed using a g5.2xlarge AWS instance (AMD EPYC 7R32 CPU, with an Nvidia A10G 24 GB GPU).
Software Dependencies	No	The paper mentions 'PyTorch' implicitly through code examples and 'Py Grinder' for dataset generation, but no specific version numbers are provided for these software components.
Experiment Setup	Yes	We train for 400 epochs and select the best checkpoint via a validation set. The final z S from our spectral encoder is concatenated with the other conditioning signals (e.g. partial observations) at every denoising step to guide the model. Diffusion Steps: Tmax = 50. Batch Size: 16. Learning Rate: 1e-3.