UnHiPPO: Uncertainty-aware Initialization for State Space Models

Authors: Marten Lienen, Abdullah Saydemir, Stephan Günnemann

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that our initialization improves the resistance of state-space models to noise both at training and inference time. Our experiments in Section 6 demonstrate how the Un Hi PPO initialization improves the robustness of SSMs against noise using the example of LSSL.
Researcher Affiliation Academia 1Department of Computer Science, Technical University of Munich 2Munich Data Science Institute, Technical University of Munich. Correspondence to: Marten Lienen <EMAIL>.
Pseudocode No The paper primarily presents mathematical derivations and equations (e.g., Eq. (4), Eq. (6), Eq. (7), Eq. (8), Eq. (12), Eq. (20), Eq. (25)), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Find our implementation at cs.cit.tum.de/daml/unhippo.
Open Datasets Yes We evaluate Un LSSL on two sequence classification datasets, the Free Spoken Digits dataset (FSD) (Jackson et al., 2018) and a 10-class subset of the Speech Commands dataset (SC10) (Warden, 2018).
Dataset Splits No The paper mentions training and test sets and describes data preprocessing such as looping and cutting recordings: "We loop recordings shorter than one second and then cut them so that each sample is a univariate sequence of length 8000." However, it does not specify explicit split percentages, sample counts for train/validation/test sets, or reference standard predefined splits for reproducibility.
Hardware Specification No The Acknowledgments section mentions the "Munich Center for Machine Learning for providing compute resources." However, this is a general statement and does not provide specific hardware details (e.g., GPU models, CPU types, or memory specifications) used for running the experiments.
Software Dependencies No For our results, we rely on excellent software packages, notably numpy (Harris et al., 2020), scipy (Virtanen et al., 2020), pytorch (Paszke et al., 2019), einops (Rogozhnikov, 2022), matplotlib (Hunter, 2007), hydra (Yadan, 2019) and jupyter (Granger & Pérez, 2021). These software packages are listed, but specific version numbers for them are not provided, which is necessary for reproducibility.
Experiment Setup Yes We use the parameters listed in Table 2 for all models. For the range of tk in the initialization of LSSL, we set tmin = 10 and tmax = 1000 to cover a range of time scales. Table 2. Hyperparameters of the LSSL architecture used for the SC10 experiments. Parameter Value Layers 4 N 128 Linear Embedding Size 128 Latent Channels 4 Dropout 0.1 Un Hi PPO σ2 1010 Training Steps 100000 Batch Size 16