reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Tuning Frequency Bias of State Space Models

Authors: Annan Yu, Dongwei Lyu, Soon Hoe Lim, Michael W Mahoney, N. Benjamin Erichson

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using an image-denoising task, we empirically show that we can strengthen, weaken, or even reverse the frequency bias using both mechanisms. By tuning the frequency bias, we can also improve SSMs performance on learning long-range sequences, averaging an 88.26% accuracy on the Long-Range Arena (LRA) benchmark tasks. Contribution. Here are our main contributions: (3) We empirically demonstrate the effectiveness of our tuning strategies using an image-denoising task. We also show that tuning the frequency bias helps an S4D model to achieve state-of-the-art performance on the Long-Range Arena tasks and provide ablation studies.
Researcher Affiliation	Academia	1 Center for Applied Mathematics, Cornell University, Ithaca, NY 14853, USA 2 Data Science Institute, University of Chicago, Chicago, IL 60637, USA 3 Department of Mathematics, KTH Royal Institute of Technology, Stockholm, Sweden 4 Nordita, KTH Royal Institute of Technology and Stockholm University, Stockholm, Sweden 5 Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA 6 International Computer Science Institute, Berkeley, CA 94704, USA 7 Department of Statistics, University of California at Berkeley, Berkeley, CA 94720, USA EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes methods and derivations mathematically but does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	No	The paper references an S4 implementation: "Albert Gu, Karan Goel, and Christopher R e. s4. https://github.com/state-spaces/s4, 2021a." However, this is a citation for a related tool/library (S4) and not an explicit statement that the authors are releasing the source code for their specific methodology (tuning strategies) described in this paper.
Open Datasets	Yes	Using an image-denoising task, we empirically show that we can strengthen, weaken, or even reverse the frequency bias using both mechanisms. By tuning the frequency bias, we can also improve SSMs performance on learning long-range sequences, averaging an 88.26% accuracy on the Long-Range Arena (LRA) benchmark tasks (Tay et al., 2021). We now provide an example of how our two mechanisms allow us to tune frequency bias. In this example, we train an SSM to denoise an image in the Celeb A dataset (Liu et al., 2015). (III) Ablation Studies. We perform ablation studies of our two tuning strategies by training a smaller S4D model to learn the grayscale s CIFAR-10 task. I.2 TUNING FREQUENCY BIAS IN MOVING MNIST VIDEO PREDICTION. We apply the model to predict movies from the Moving MNIST dataset (Srivastava et al., 2015).
Dataset Splits	Yes	Equipped with our two tuning strategies, a simple S4D model can be trained to average an 88.26% accuracy on the Long-Range Arena (LRA) benchmark tasks (Tay et al., 2021). (III) Ablation Studies. We perform ablation studies of our two tuning strategies by training a smaller S4D model to learn the grayscale s CIFAR-10 task. From Figure 5, we obtain better performance when we slightly increase α or decrease β. I.2 TUNING FREQUENCY BIAS IN MOVING MNIST VIDEO PREDICTION. We apply the model to predict movies from the Moving MNIST dataset (Srivastava et al., 2015). In our experiment, we slightly modify the movies by coloring the two digits. In particular, every movie contains two moving digits a fast-moving red one and a slow-moving blue one. Table 3: Test accuracies in the Long-Range Arena of different variants of SSMs. An entry is left blank if no result is found. The row labeled Ours stands for the S4D model equipped with our two tuning strategies. Experiments were run with 5 random seeds and the medians and the standard deviations are reported.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using S4D and Conv S5 models, and cites an S4 GitHub repository, but does not specify any software libraries or frameworks with version numbers (e.g., Python 3.x, PyTorch 1.x, CUDA version).
Experiment Setup	Yes	H.2 LONG-RANGE ARENA In this section, we present the hyperparameters of our models trained on the Long-Range Arena tasks. Our model architecture and hyperparameters are almost identical to those of the S4D models reported in Gu et al. (2022a), with only two exceptions: for the List Ops experiment, we set n = 2 instead of n = 64, which aligns with Smith et al. (2023) instead; for the Path X experiment, we set d model = 128 to reduce the computational burden. We do not report the dropout rates since they are set to be the same as those in Gu et al. (2022a). Also, we made β a trainable parameter. Table 4: Configurations of our S4D model, where LR, BS, and WD stand for learning rate, batch size, and weight decay, respectively.