FACTS: A Factored State-Space Framework for World Modelling
Authors: Li Nanbo, Firas Laakom, Yucheng XU, Wenyi Wang, Jürgen Schmidhuber
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate FACTS across diverse tasks, including multivariate time series forecasting, object-centric world modelling, and spatial-temporal graph prediction, demonstrating that it consistently outperforms or matches specialised state-of-the-art models, despite its general-purpose world modelling design. ... We conduct an extensive empirical analysis across multiple tasks, such as multivariate time series forecasting and object-centric world modelling, demonstrating that FACTS consistently matches or exceeds the performance of specialised state-of-the-art models. |
| Researcher Affiliation | Academia | Li Nanbo1 , Firas Laakom1, Yucheng Xu2, Wenyi Wang1, J urgen Schmidhuber1,3 1Center of Excellence for Generative AI, KAUST, Saudi Arabia 2School of Informatics, University of Edinburgh, United Kingdom 3The Swiss AI Lab, IDSIA, USI & SUPSI, Switzerland |
| Pseudocode | Yes | Algorithm 1 FACTS Module: a Pseudo Implementation 1: Input: X1 t Rt m d t-sequential axis, m-spatial axis 2: Output: Z1 t Rt k d |
| Open Source Code | Yes | Code available at: https://github.com/Nanbo Li/FACTS. |
| Open Datasets | Yes | We use the open-source Time Series Library (TSLib)1 to evaluate long-term multivariate time-series forecasting (MSTF) tasks across 9 real-world datasets spanning multiple domains (e.g., energy, weather, and finance). ... synthetic multi-object videos (Yi et al., 2020; Greff et al., 2022; Lin et al., 2020), and dynamic-graph node prediction (Li et al., 2018). |
| Dataset Splits | Yes | We use the open-source Time Series Library (TSLib)... following TSLib s standardised settings: the input sequence length is fixed at 96, with prediction lengths of {96, 192, 336, 720}. ... During testing, to ensure a fair comparison with Slot Former, we burn-in the first 6 frames and roll out (predict) 48 frames. |
| Hardware Specification | Yes | All results reported for FACTS in this paper were generated using a single NVIDIA A100 GPU (80 GB). |
| Software Dependencies | No | The paper mentions "Py Torch" but does not specify a version number, nor does it list other software dependencies with specific versions. For example: "We implement this using Py Torch s standard Conv2d module". |
| Experiment Setup | Yes | following TSLib s standardised settings: the input sequence length is fixed at 96, with prediction lengths of {96, 192, 336, 720}. Performance is evaluated using mean-squared error (MSE) and mean-absolute error (MAE). ... For the slot dynamics prediction task... we burn-in the first 6 frames and roll out (predict) 48 frames. ... The primary modification is replacing SAVi s recurrent slot attention modules with FACTS. Importantly, all of the used modules (CNN vision encoders, FACTS, and decoders) end-to-end in a single run without any supervision. |