Self-supervised contrastive learning performs non-linear system identification
Authors: Rodrigo Gonzalez Laiz, Tobias Schmidt, Steffen Schneider
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here, we deepen this connection and show that SSL can perform system identification in latent space. We propose dynamics contrastive learning, a framework to uncover linear, switching linear and non-linear dynamics under a non-linear observation model, give theoretical guarantees and validate them empirically. Code: github.com/dynamical-inference/dcl [...] To verify our theory, we implement a benchmark dataset for studying the effects of various model choices. We generate time-series with 1M samples, either as a single sequence or across multiple trials. Our experiments rigorously evaluate different variants of contrastive learning algorithms. |
| Researcher Affiliation | Academia | Institute of Computational Biology, Computational Health Center, Helmholtz Munich and Munich Center for Machine Learning (MCML) |
| Pseudocode | No | The paper describes methods and models using mathematical equations and diagrams (e.g., Figure 1: DCL framework, Figure 3: epsilon-SLDS model components), but does not include any explicitly labeled pseudocode or algorithm blocks with structured, code-like steps. |
| Open Source Code | Yes | Code: github.com/dynamical-inference/dcl [...] Code. Code is available at https://github.com/dynamical-inference/dcl under an Apache 2.0 license. |
| Open Datasets | Yes | J APPLICATION TO REAL-WORLD DATA [...] Here we evaluate DCL using a real-world dataset obtained from the Allen Institute (de Vries et al., 2020). The dataset contains recordings from awake, head-fixed mice as they viewed visual stimuli including three movies on a continuous loop. [...] This exact dataset was also used by Schneider et al. (2023) and available as allen-movie-one-ca-VISp-800-* in the CEBRA software package. |
| Dataset Splits | Yes | We train on the neural activity of the first 9 repetitions (8100 samples, 270s) and use the 10th (900 samples, 30s) for evaluation. |
| Hardware Specification | Yes | Experiments were carried out on a compute cluster with A100 cards. On each card, we ran 3 experiments simultaneously. [...] The combined experiments ran for this paper comprised about 120 days of A100 compute time and we provide a breakdown in Appendix K. |
| Software Dependencies | No | The paper mentions using the Adam optimizer and GELU activations, and cites relevant papers for these components. However, it does not specify version numbers for any software libraries, programming languages, or specific frameworks (e.g., PyTorch, TensorFlow) used in the implementation. |
| Experiment Setup | Yes | For the feature encoder h, baseline and our model use an MLP with three layers followed by GELU activations (Hendrycks & Gimpel, 2016). Model capacity scales with the embedding dimensionality d. The last hidden layer has 10d units and all previous layers have 30d units. For the SLDS and LDS datasets, we train on batches with 2048 samples each (reference and positive). We use 2^16 = 65536 negative samples for SLDS and 20k negative samples for LDS data. For the Lorenz data, we use a batch size of 1024 and 20k negative samples. We use the Adam optimizer (Kingma, 2014) with learning rates 3e-4 for LDS data, 1e-3 for SLDS data, and 1e-4 for Lorenz system data. For the SLDS data, we use a different learning rate of 1e-2 for the parameters of the dynamics model. We train for 50k steps on SLDS data and for 30k steps for LDS and Lorenz system data. |