HOPE for a Robust Parameterization of Long-memory State Space Models

Authors: Annan Yu, Michael W Mahoney, N. Benjamin Erichson

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental When benchmarked against Hi PPO-initialized models such as S4 and S4D, an SSM parameterized by Hankel operators demonstrates improved performance on Long-Range Arena (LRA) tasks. Moreover, our new parameterization endows the SSM with non-decaying memory within a fixed time window, which is empirically corroborated by a sequential CIFAR-10 task with padded noise.
Researcher Affiliation Academia 1 Center for Applied Mathematics, Cornell University, Ithaca, NY 14853, USA 2 Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA 3 International Computer Science Institute, Berkeley, CA 94704, USA 4 Department of Statistics, University of California at Berkeley, Berkeley, CA 94720, USA
Pseudocode Yes Algorithm 1 Computing the output of an LTI system parameterized by its Hankel matrix. Input: an input sequence u RL, the Markov parameters of a Hankel matrix h Cn, and a sampling period t>0. Output: the output y RL of the LTI system defined by h given input u and sampling period t. 1: ω exp 2πi 0:(L 1) L {create FFT nodes} 2: s (ω 1)./(ω + 1) {convert to the s-domain, where ./ is the entrywise division} 3: s s/ t {scale the frequency domain in the s-plane} 4: ω (1 + s)./(1 s) {convert back to the z-plane} 5: g zeros(L) {store samples of the transfer function} 6: for i = 0 : (n 1) do 7: g g + hi (ω. ( i 1)) {compute the ith moment, where . is the entrywise power} 8: end for 9: y Re (i FFT(FFT(u) g)) { is the entrywise (i.e., Hadamard) product}
Open Source Code No Our codes are adapted from the code associated with the original S4 and S4D papers (Gu et al., 2022b;a) (Apache License, Version 2.0). Our codes are adapted from the code associated with the original S4 and S4D papers (Gu et al., 2022b;a) (Apache License, Version 2.0).
Open Datasets Yes We train an S4D model to learn the s CIFAR-10 image classification task (Krizhevsky et al., 2009; Tay et al., 2021). ... We test the performance of a full-scale HOPE-SSM on the Long-Range Arena (LRA) tasks.
Dataset Splits No For each flattened picture in the s CIFAR-10 dataset, which contains 1024 vectors of length 3, we append a random sequence of 1024 vectors of length 3 to the end of it. The goal is still to classify an image by its first 1024 pixels. We call this task noise-padded s CIFAR-10.
Hardware Specification Yes All experiments are done on a NVIDIA A30 Tensor Core GPU with 24 GB of memory.
Software Dependencies No Our codes are adapted from the code associated with the original S4 and S4D papers (Gu et al., 2022b;a) (Apache License, Version 2.0).
Experiment Setup Yes We use the same model hyperparameters as the S4D models in section 3. In particular, the Hankel matrices in this model are 64-by-64. We randomly initialize the Hankel matrix and do not set a smaller learning rate for the Hankel matrix entries h, i.e., all model parameters except for t have the same learning rate. ... When we parameterize the LTI systems using A, B, C, and D, we assign a learning rate of 0.001 to A and of 0.01 to the rest. ... Table 2: Configurations of the HOPE-SSM model, where DO, LR, BS, and WD stand for dropout rate, learning rate, batch size, and weight decay, respectively.