UnHiPPO: Uncertainty-aware Initialization for State Space Models
Authors: Marten Lienen, Abdullah Saydemir, Stephan Günnemann
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that our initialization improves the resistance of state-space models to noise both at training and inference time. Our experiments in Section 6 demonstrate how the Un Hi PPO initialization improves the robustness of SSMs against noise using the example of LSSL. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Technical University of Munich 2Munich Data Science Institute, Technical University of Munich. Correspondence to: Marten Lienen <EMAIL>. |
| Pseudocode | No | The paper primarily presents mathematical derivations and equations (e.g., Eq. (4), Eq. (6), Eq. (7), Eq. (8), Eq. (12), Eq. (20), Eq. (25)), but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Find our implementation at cs.cit.tum.de/daml/unhippo. |
| Open Datasets | Yes | We evaluate Un LSSL on two sequence classification datasets, the Free Spoken Digits dataset (FSD) (Jackson et al., 2018) and a 10-class subset of the Speech Commands dataset (SC10) (Warden, 2018). |
| Dataset Splits | No | The paper mentions training and test sets and describes data preprocessing such as looping and cutting recordings: "We loop recordings shorter than one second and then cut them so that each sample is a univariate sequence of length 8000." However, it does not specify explicit split percentages, sample counts for train/validation/test sets, or reference standard predefined splits for reproducibility. |
| Hardware Specification | No | The Acknowledgments section mentions the "Munich Center for Machine Learning for providing compute resources." However, this is a general statement and does not provide specific hardware details (e.g., GPU models, CPU types, or memory specifications) used for running the experiments. |
| Software Dependencies | No | For our results, we rely on excellent software packages, notably numpy (Harris et al., 2020), scipy (Virtanen et al., 2020), pytorch (Paszke et al., 2019), einops (Rogozhnikov, 2022), matplotlib (Hunter, 2007), hydra (Yadan, 2019) and jupyter (Granger & Pérez, 2021). These software packages are listed, but specific version numbers for them are not provided, which is necessary for reproducibility. |
| Experiment Setup | Yes | We use the parameters listed in Table 2 for all models. For the range of tk in the initialization of LSSL, we set tmin = 10 and tmax = 1000 to cover a range of time scales. Table 2. Hyperparameters of the LSSL architecture used for the SC10 experiments. Parameter Value Layers 4 N 128 Linear Embedding Size 128 Latent Channels 4 Dropout 0.1 Un Hi PPO σ2 1010 Training Steps 100000 Batch Size 16 |