reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

HOPE for a Robust Parameterization of Long-memory State Space Models

Authors: Annan Yu, Michael W Mahoney, N. Benjamin Erichson

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	When benchmarked against Hi PPO-initialized models such as S4 and S4D, an SSM parameterized by Hankel operators demonstrates improved performance on Long-Range Arena (LRA) tasks. Moreover, our new parameterization endows the SSM with non-decaying memory within a fixed time window, which is empirically corroborated by a sequential CIFAR-10 task with padded noise.
Researcher Affiliation	Academia	1 Center for Applied Mathematics, Cornell University, Ithaca, NY 14853, USA 2 Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA 3 International Computer Science Institute, Berkeley, CA 94704, USA 4 Department of Statistics, University of California at Berkeley, Berkeley, CA 94720, USA
Pseudocode	Yes	Algorithm 1 Computing the output of an LTI system parameterized by its Hankel matrix. Input: an input sequence u RL, the Markov parameters of a Hankel matrix h Cn, and a sampling period t>0. Output: the output y RL of the LTI system defined by h given input u and sampling period t. 1: ω exp 2πi 0:(L 1) L {create FFT nodes} 2: s (ω 1)./(ω + 1) {convert to the s-domain, where ./ is the entrywise division} 3: s s/ t {scale the frequency domain in the s-plane} 4: ω (1 + s)./(1 s) {convert back to the z-plane} 5: g zeros(L) {store samples of the transfer function} 6: for i = 0 : (n 1) do 7: g g + hi (ω. ( i 1)) {compute the ith moment, where . is the entrywise power} 8: end for 9: y Re (i FFT(FFT(u) g)) { is the entrywise (i.e., Hadamard) product}
Open Source Code	No	Our codes are adapted from the code associated with the original S4 and S4D papers (Gu et al., 2022b;a) (Apache License, Version 2.0). Our codes are adapted from the code associated with the original S4 and S4D papers (Gu et al., 2022b;a) (Apache License, Version 2.0).
Open Datasets	Yes	We train an S4D model to learn the s CIFAR-10 image classification task (Krizhevsky et al., 2009; Tay et al., 2021). ... We test the performance of a full-scale HOPE-SSM on the Long-Range Arena (LRA) tasks.
Dataset Splits	No	For each flattened picture in the s CIFAR-10 dataset, which contains 1024 vectors of length 3, we append a random sequence of 1024 vectors of length 3 to the end of it. The goal is still to classify an image by its first 1024 pixels. We call this task noise-padded s CIFAR-10.
Hardware Specification	Yes	All experiments are done on a NVIDIA A30 Tensor Core GPU with 24 GB of memory.
Software Dependencies	No	Our codes are adapted from the code associated with the original S4 and S4D papers (Gu et al., 2022b;a) (Apache License, Version 2.0).
Experiment Setup	Yes	We use the same model hyperparameters as the S4D models in section 3. In particular, the Hankel matrices in this model are 64-by-64. We randomly initialize the Hankel matrix and do not set a smaller learning rate for the Hankel matrix entries h, i.e., all model parameters except for t have the same learning rate. ... When we parameterize the LTI systems using A, B, C, and D, we assign a learning rate of 0.001 to A and of 0.01 to the rest. ... Table 2: Configurations of the HOPE-SSM model, where DO, LR, BS, and WD stand for dropout rate, learning rate, batch size, and weight decay, respectively.