Sanity Checking Causal Representation Learning on a Simple Real-World System

Authors: Juan L. Gamella, Simon Bing, Jakob Runge

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate methods for causal representation learning (CRL) on a simple, real-world system where these methods are expected to work. The system consists of a controlled optical experiment specifically built for this purpose, which satisfies the core assumptions of CRL and where the underlying causal factors the inputs to the experiment are known, providing a ground truth. We select methods representative of different approaches to CRL and find that they all fail to recover the underlying causal factors. To understand the failure modes of the evaluated algorithms, we perform an ablation on the data by substituting the real data-generating process with a simpler synthetic equivalent. The results reveal a reproducibility problem, as most methods already fail on this synthetic ablation despite its simple data-generating process.
Researcher Affiliation Academia 1Seminar for Statistics, ETH Zurich 2Technische Universit at Berlin 3Department of Computer Science, University of Potsdam 4Sca DS.AI Dresden/Leipzig, TU Dresden.
Pseudocode No The paper describes methods and implementations through textual descriptions and equations, but does not contain a clearly labeled pseudocode or algorithm block, nor structured steps formatted like code.
Open Source Code Yes The code to reproduce the results of this paper can be found at github. com/simonbing/CRLSanity Check.
Open Datasets Yes We make the novel datasets and their data-collection procedures publicly available in the lt_crl_benchmark_v1 dataset at github.com/ juangamella/causal-chamber.
Dataset Splits Yes The full dataset is then split into train, validation, and test subsets according to the ratios (80/10/10) while ensuring that each subset contains the same fraction of samples from each environment.
Hardware Specification Yes All experiments were run on a high-performance cluster with NVIDIA A100 GPUs.
Software Dependencies No All implementations use the Py Torch machine learning library (Paszke et al., 2019).
Experiment Setup Yes We report the hyperparameters used during training in Table 2.