Invariant Structure Learning for Better Generalization and Causal Explainability

Authors: Yunhao Ge, Sercan O Arik, Jinsung Yoon, Ao Xu, Laurent Itti, Tomas Pfister

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of ISL on various synthetic and real-world datasets. ISL yields state-of-the-art SCM discovery (clearly outperforming alternatives on real-world data) with a particularly prominent improvement for complex graphs structures. In addition, ISL improves the test prediction accuracy throughout, with especially large improvements in cases with significant data drifts (up to 80% MSE reduction compared to alternatives). Section 4 Experiments: In this section, we evaluate the proposed ISL framework for causal explainability and better generalization. We conduct extensive experiments in two settings based on the availability of target labels: supervised learning tasks in Sec. 4.1 and self-supervised learning tasks in Sec. 4.2. Details and more results are provided in the Appendix D. Baselines: On causal explainability, we choose NOTEARS-MLP (Zheng et al., 2020), GOLEM (Ng et al., 2020), and No Fear (Wei et al., 2020) as the baselines for learning the SCM which represented as a DAG. On target prediction, we choose a standard MLP and CASTLE (Kyono et al., 2020) as the baseline methods. Metrics: We evaluate the estimated Y-related DAG and whole DAG structure using Structural Hamming Distance (SHD): the number of missing, falsely detected or reversed edges, lower the better. We evaluate the target (Y ) prediction accuracy in Mean Squared Error (MSE). We compute SHD and the errors for multiple times and report the mean value.
Researcher Affiliation Collaboration Yunhao Ge , , Sercan Ö. Arık , Jinsung Yoon , Ao Xu , Laurent Itti , and Tomas Pfister EMAIL, EMAIL Google Cloud AI, Sunnyvale, CA, USA University of Southern California, Los Angeles, CA, USA
Pseudocode Yes Algorithm 1: Supervised Invariant Structure Learning Input: Dataset D Output: DAG, Y predictor f(X) = h θY 1 (X) ... Algorithm 2: Self-Supervised Invariant Structure Learning Input: Dataset D Output: DAG
Open Source Code Yes We open-source our code at https://github.com/Aaron Xu9/ISL.git.1 1The implementation is available in https://github.com/Aaron Xu9/ISL.git
Open Datasets Yes We perform supervised learning experiments on real-world datasets with GT causal structure: Boston Housing (Binder et al., 1997; bos) and Insurance (Binder et al., 1997; ins) datasets. ... The Sachs dataset is for the discovery of protein signaling network on expression levels of different proteins and phospholipids in human cells (Sachs et al., 2005), and is a popular benchmark for causal graph discovery, containing both observational and interventional data. ... http://lib.stat.cmu.edu/datasets/boston. https://link.springer.com/article/10.1023/A:1007421730016. https://www.science.org/doi/full/10.1126/science.1105809.
Dataset Splits Yes We perform supervised learning experiments on real-world datasets with GT causal structure: Boston Housing (Binder et al., 1997; bos) and Insurance (Binder et al., 1997; ins) datasets. For each, we randomly split the train/validation/test with the proportion 0.8/0.1/0.1.
Hardware Specification Yes The time measurements were obtained on an Apple M1 Pro chip with 16GB of memory.
Software Dependencies No The paper mentions mathematical optimization methods like L-BFGS-B (Zhu et al., 1997) and clustering algorithms like K-means (Lloyd, 1982), but it does not specify any software libraries or frameworks used (e.g., PyTorch, TensorFlow, scikit-learn) along with their version numbers, which are necessary for reproducible software dependencies.
Experiment Setup Yes We set a minimum edges number Emin and a maximum edges number Emax based on the dataset information. Usually, Emin is half of the number of nodes |E|/2 and Emax is 5|E|. We also set a range of threshold t [tmin, tmax] and a step size ts base on the value range of W. Usually we use tmin = min(W) and tmax = max(W). ... We choose the value of γ and βi that achieves the smallest target Y reconstruction on the validation set. We find the parameters: γ = 1; β1 = 0.001; β2 = 0.01; β3 = 0.01; β4 = 0.01 as reasonable choices across many different settings, although they are not extensively optimized.