When Selection Meets Intervention: Additional Complexities in Causal Discovery
Authors: Haoyue Dai, Ignavier Ng, Jianle Sun, Zeyu Tang, Gongxu Luo, Xinshuai Dong, Peter Spirtes, Kun Zhang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present empirical studies on simulations and real-world data to demonstrate that our algorithm effectively identifies true causal relations despite the presence of selection bias. |
| Researcher Affiliation | Academia | 1Carnegie Mellon University 2Mohamed bin Zayed University of Artificial Intelligence |
| Pseudocode | Yes | The pseudocode for the CDIS algorithm is provided below: Algorithm 1: Causal Discovery from Interventional data under potential Selection bias (CDIS) Input: Observational and interventional data {p(k)}K k=0 over X[D] with unknown targets. Output: A partially ancestral graph (PAG) over vertices [D]. Step 1: Get maximal orientation from pure observational data. ˆ M(0) FCI+(FAS(p(0))). |
| Open Source Code | Yes | A Python implementation of CDIS is available at https://github.com/Mark Dana/CDIS. |
| Open Datasets | Yes | We demonstrate the effectiveness of our algorithm using synthetic and real-world datasets on biology and education ( 5)...We evaluate the gene regulation networks (GRNs) of 24 previous reported essential regulatory genes encoding different transcription factors (TFs) (Yang et al., 2018) using a single-cell perturbation data, i.e., sci Plex2 (Peidli et al., 2024)...We also apply CDIS to an educational dataset (Table 1), from a random controlled trial evaluating the effects of incentives and services on college freshmen s academic achievements (Angrist et al., 2009). |
| Dataset Splits | Yes | Sample selection is performed based on each S i , as a linear sum of its parents in X and an independent noise, falling within a predefined interval and ensuring the desired sample size (5, 000 samples after selection)...As depicted in Figure 10, subgroup analysis stratified by genders indicate that SSP only improves the women s performance while SFP shows effects only on men (see Figure 11). |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. It mentions running simulations and real-world experiments but does not specify GPU/CPU models, processor types, or memory amounts. |
| Software Dependencies | No | We use the implementation of IGSP, UT-IGSP, and JCI-GSP from the causaldag package (Chandler Squires, 2018), and the implementation of CD-NOD from the causal-learn package (Zheng et al., 2023). We use the Fisher Z test to examine conditional relations. |
| Experiment Setup | Yes | Specifically, we begin by randomly sampling Erdös Rényi (Erdös & Rényi, 1959) graphs with an average degree of 2 as the ground truth DAG for {X i }D i=1...We then simulate linear SEMs for {X i }D i=1 with exogenous noise terms {ϵ i }D i=1...Here, the linear edge coefficients are sampled from Unif([ 2, 0.5] [0.5, 2]), and the variances of exogenous noise terms are sampled from Unif[1, 4]. Finally, using the exogenous noise terms {ϵ i }D i=1 of the selected samples, we simulate interventional data over {Xi}D i=1 under different targets. We simulate a total of D/2 interventions, each with one variable being intervened on...The significance level is set to 0.05 for 5 and 10 variables, and 0.01 for 15 and 20 variables. |