Tuning Frequency Bias of State Space Models
Authors: Annan Yu, Dongwei Lyu, Soon Hoe Lim, Michael W Mahoney, N. Benjamin Erichson
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using an image-denoising task, we empirically show that we can strengthen, weaken, or even reverse the frequency bias using both mechanisms. By tuning the frequency bias, we can also improve SSMs performance on learning long-range sequences, averaging an 88.26% accuracy on the Long-Range Arena (LRA) benchmark tasks. Contribution. Here are our main contributions: (3) We empirically demonstrate the effectiveness of our tuning strategies using an image-denoising task. We also show that tuning the frequency bias helps an S4D model to achieve state-of-the-art performance on the Long-Range Arena tasks and provide ablation studies. |
| Researcher Affiliation | Academia | 1 Center for Applied Mathematics, Cornell University, Ithaca, NY 14853, USA 2 Data Science Institute, University of Chicago, Chicago, IL 60637, USA 3 Department of Mathematics, KTH Royal Institute of Technology, Stockholm, Sweden 4 Nordita, KTH Royal Institute of Technology and Stockholm University, Stockholm, Sweden 5 Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA 6 International Computer Science Institute, Berkeley, CA 94704, USA 7 Department of Statistics, University of California at Berkeley, Berkeley, CA 94720, USA EMAIL, EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods and derivations mathematically but does not include any explicitly labeled pseudocode or algorithm blocks with structured steps. |
| Open Source Code | No | The paper references an S4 implementation: "Albert Gu, Karan Goel, and Christopher R e. s4. https://github.com/state-spaces/s4, 2021a." However, this is a citation for a related tool/library (S4) and not an explicit statement that the authors are releasing the source code for their specific methodology (tuning strategies) described in this paper. |
| Open Datasets | Yes | Using an image-denoising task, we empirically show that we can strengthen, weaken, or even reverse the frequency bias using both mechanisms. By tuning the frequency bias, we can also improve SSMs performance on learning long-range sequences, averaging an 88.26% accuracy on the Long-Range Arena (LRA) benchmark tasks (Tay et al., 2021). We now provide an example of how our two mechanisms allow us to tune frequency bias. In this example, we train an SSM to denoise an image in the Celeb A dataset (Liu et al., 2015). (III) Ablation Studies. We perform ablation studies of our two tuning strategies by training a smaller S4D model to learn the grayscale s CIFAR-10 task. I.2 TUNING FREQUENCY BIAS IN MOVING MNIST VIDEO PREDICTION. We apply the model to predict movies from the Moving MNIST dataset (Srivastava et al., 2015). |
| Dataset Splits | Yes | Equipped with our two tuning strategies, a simple S4D model can be trained to average an 88.26% accuracy on the Long-Range Arena (LRA) benchmark tasks (Tay et al., 2021). (III) Ablation Studies. We perform ablation studies of our two tuning strategies by training a smaller S4D model to learn the grayscale s CIFAR-10 task. From Figure 5, we obtain better performance when we slightly increase α or decrease β. I.2 TUNING FREQUENCY BIAS IN MOVING MNIST VIDEO PREDICTION. We apply the model to predict movies from the Moving MNIST dataset (Srivastava et al., 2015). In our experiment, we slightly modify the movies by coloring the two digits. In particular, every movie contains two moving digits a fast-moving red one and a slow-moving blue one. Table 3: Test accuracies in the Long-Range Arena of different variants of SSMs. An entry is left blank if no result is found. The row labeled Ours stands for the S4D model equipped with our two tuning strategies. Experiments were run with 5 random seeds and the medians and the standard deviations are reported. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using S4D and Conv S5 models, and cites an S4 GitHub repository, but does not specify any software libraries or frameworks with version numbers (e.g., Python 3.x, PyTorch 1.x, CUDA version). |
| Experiment Setup | Yes | H.2 LONG-RANGE ARENA In this section, we present the hyperparameters of our models trained on the Long-Range Arena tasks. Our model architecture and hyperparameters are almost identical to those of the S4D models reported in Gu et al. (2022a), with only two exceptions: for the List Ops experiment, we set n = 2 instead of n = 64, which aligns with Smith et al. (2023) instead; for the Path X experiment, we set d model = 128 to reduce the computational burden. We do not report the dropout rates since they are set to be the same as those in Gu et al. (2022a). Also, we made β a trainable parameter. Table 4: Configurations of our S4D model, where LR, BS, and WD stand for learning rate, batch size, and weight decay, respectively. |