HOPE for a Robust Parameterization of Long-memory State Space Models
Authors: Annan Yu, Michael W Mahoney, N. Benjamin Erichson
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | When benchmarked against Hi PPO-initialized models such as S4 and S4D, an SSM parameterized by Hankel operators demonstrates improved performance on Long-Range Arena (LRA) tasks. Moreover, our new parameterization endows the SSM with non-decaying memory within a fixed time window, which is empirically corroborated by a sequential CIFAR-10 task with padded noise. |
| Researcher Affiliation | Academia | 1 Center for Applied Mathematics, Cornell University, Ithaca, NY 14853, USA 2 Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA 3 International Computer Science Institute, Berkeley, CA 94704, USA 4 Department of Statistics, University of California at Berkeley, Berkeley, CA 94720, USA |
| Pseudocode | Yes | Algorithm 1 Computing the output of an LTI system parameterized by its Hankel matrix. Input: an input sequence u RL, the Markov parameters of a Hankel matrix h Cn, and a sampling period t>0. Output: the output y RL of the LTI system defined by h given input u and sampling period t. 1: ω exp 2πi 0:(L 1) L {create FFT nodes} 2: s (ω 1)./(ω + 1) {convert to the s-domain, where ./ is the entrywise division} 3: s s/ t {scale the frequency domain in the s-plane} 4: ω (1 + s)./(1 s) {convert back to the z-plane} 5: g zeros(L) {store samples of the transfer function} 6: for i = 0 : (n 1) do 7: g g + hi (ω. ( i 1)) {compute the ith moment, where . is the entrywise power} 8: end for 9: y Re (i FFT(FFT(u) g)) { is the entrywise (i.e., Hadamard) product} |
| Open Source Code | No | Our codes are adapted from the code associated with the original S4 and S4D papers (Gu et al., 2022b;a) (Apache License, Version 2.0). Our codes are adapted from the code associated with the original S4 and S4D papers (Gu et al., 2022b;a) (Apache License, Version 2.0). |
| Open Datasets | Yes | We train an S4D model to learn the s CIFAR-10 image classification task (Krizhevsky et al., 2009; Tay et al., 2021). ... We test the performance of a full-scale HOPE-SSM on the Long-Range Arena (LRA) tasks. |
| Dataset Splits | No | For each flattened picture in the s CIFAR-10 dataset, which contains 1024 vectors of length 3, we append a random sequence of 1024 vectors of length 3 to the end of it. The goal is still to classify an image by its first 1024 pixels. We call this task noise-padded s CIFAR-10. |
| Hardware Specification | Yes | All experiments are done on a NVIDIA A30 Tensor Core GPU with 24 GB of memory. |
| Software Dependencies | No | Our codes are adapted from the code associated with the original S4 and S4D papers (Gu et al., 2022b;a) (Apache License, Version 2.0). |
| Experiment Setup | Yes | We use the same model hyperparameters as the S4D models in section 3. In particular, the Hankel matrices in this model are 64-by-64. We randomly initialize the Hankel matrix and do not set a smaller learning rate for the Hankel matrix entries h, i.e., all model parameters except for t have the same learning rate. ... When we parameterize the LTI systems using A, B, C, and D, we assign a learning rate of 0.001 to A and of 0.01 to the rest. ... Table 2: Configurations of the HOPE-SSM model, where DO, LR, BS, and WD stand for dropout rate, learning rate, batch size, and weight decay, respectively. |