reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Oscillatory State-Space Models

Authors: T. Konstantin Rusch, Daniela Rus

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, our empirical results, spanning a wide range of time-series tasks from mid-range to very long-range classification and regression, as well as long-horizon forecasting, demonstrate that our proposed Lin OSS model consistently outperforms state-of-the-art sequence models. Notably, Lin OSS outperforms Mamba and LRU by nearly 2x on a sequence modeling task with sequences of length 50k. Code: https://github.com/tk-rusch/linoss.
Researcher Affiliation	Academia	T. Konstantin Rusch MIT EMAIL Daniela Rus MIT
Pseudocode	Yes	Algorithm 1 Full Lin OSS model Input: Input sequence u Output: L-block Lin OSS output sequence o u0 Wencu + benc Encode input sequence for l = 1, . . . , L do yl solution of ODE in (1) with input ul 1 via parallel scan xl Cyl + Dul 1 Linear readout in (1) xl GELU(xl) ul GLU(xl) + ul 1 end for o Wdecy L + bdec Decode final Lin OSS block output
Open Source Code	Yes	Code: https://github.com/tk-rusch/linoss.
Open Datasets	Yes	In the first part of the experiments, we focus on a recently proposed long-range sequential benchmark introduced in Walker et al. (2024). This benchmark focuses on six datasets from the University of East Anglia (UEA) Multivariate Time Series Classification Archive (UEA-MTSCA) (Bagnall et al., 2018), selecting those with the longest sequences for increased difficulty. The sequence lengths range thereby from 400 to almost 18k. To this end, we consider the PPG-Da Li A dataset, a multivariate time series regression dataset designed for heart rate prediction using data collected from a wrist-worn device (Reiss et al., 2019). We consider a weather prediction task introduced in Zhou et al. (2021).
Dataset Splits	Yes	More concretely, we use the same pre-selected random seeds for splitting the datasets into training, validation, and testing parts (using 70/15/15 splits), as well as tune our model hyperparameters only on the same pre-described grid. We follow Walker et al. (2024) and divide the data into training, validation, and test sets with a 70/15/15 split for each individual. Thus, we simply follow Gu et al. (2021) by setting up Lin OSS as a general sequence-to-sequence model that treats forecasting as a masked sequence-to-sequence transformation. We consider a weather prediction task introduced in Zhou et al. (2021). In this task, several climate variables are predicted into the future based on local climatological data. Here, we focus on the hardest task in Zhou et al. (2021) of predicting the future 720 timesteps (hours) based on the past 720 timesteps.
Hardware Specification	Yes	All experiments were conducted on Nvidia Tesla V100 GPUs and Nvidia RTX 4090 GPUs, with the exception of the PPG experiment, which was run on Nvidia Tesla A100 GPUs due to higher memory demands. Note that we used exactly the same GPU architecture as well as the same code and python libraries as in Walker et al. (2024) to ensure fair comparability, i.e., GPU memory usage and run time was measured on an Nvidia RTX 4090 GPU for all models.
Software Dependencies	No	The code to run the experiments is implemented using the JAX auto-differentiation framework (Bradbury et al., 2018). All experiments were conducted on Nvidia Tesla V100 GPUs and Nvidia RTX 4090 GPUs, with the exception of the PPG experiment, which was run on Nvidia Tesla A100 GPUs due to higher memory demands.
Experiment Setup	Yes	The hyperparameters of the models were optimized with the same grid search approach from Walker et al. (2024) for the six datasets in Section 4.1 and the PPG dataset of Section 4.2 to ensure perfect comparability with competing methods, i.e., using the grid: learning rate = {0.00001, 0.0001, 0.001}, number of layers = {2, 4, 6}, number of hidden neurons = {16, 64, 128}, state-space dimension = {16, 64, 256}, include time dimension {True, False}. Note that for the weather dataset, we performed a random search instead of grid search using the same hyperparameter bounds as before, except that we increased the maximum number of Lin OSS blocks from 6 to 8.