When Selection Meets Intervention: Additional Complexities in Causal Discovery

Authors: Haoyue Dai, Ignavier Ng, Jianle Sun, Zeyu Tang, Gongxu Luo, Xinshuai Dong, Peter Spirtes, Kun Zhang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present empirical studies on simulations and real-world data to demonstrate that our algorithm effectively identifies true causal relations despite the presence of selection bias.
Researcher Affiliation Academia 1Carnegie Mellon University 2Mohamed bin Zayed University of Artificial Intelligence
Pseudocode Yes The pseudocode for the CDIS algorithm is provided below: Algorithm 1: Causal Discovery from Interventional data under potential Selection bias (CDIS) Input: Observational and interventional data {p(k)}K k=0 over X[D] with unknown targets. Output: A partially ancestral graph (PAG) over vertices [D]. Step 1: Get maximal orientation from pure observational data. ˆ M(0) FCI+(FAS(p(0))).
Open Source Code Yes A Python implementation of CDIS is available at https://github.com/Mark Dana/CDIS.
Open Datasets Yes We demonstrate the effectiveness of our algorithm using synthetic and real-world datasets on biology and education ( 5)...We evaluate the gene regulation networks (GRNs) of 24 previous reported essential regulatory genes encoding different transcription factors (TFs) (Yang et al., 2018) using a single-cell perturbation data, i.e., sci Plex2 (Peidli et al., 2024)...We also apply CDIS to an educational dataset (Table 1), from a random controlled trial evaluating the effects of incentives and services on college freshmen s academic achievements (Angrist et al., 2009).
Dataset Splits Yes Sample selection is performed based on each S i , as a linear sum of its parents in X and an independent noise, falling within a predefined interval and ensuring the desired sample size (5, 000 samples after selection)...As depicted in Figure 10, subgroup analysis stratified by genders indicate that SSP only improves the women s performance while SFP shows effects only on men (see Figure 11).
Hardware Specification No The paper does not provide specific hardware details used for running its experiments. It mentions running simulations and real-world experiments but does not specify GPU/CPU models, processor types, or memory amounts.
Software Dependencies No We use the implementation of IGSP, UT-IGSP, and JCI-GSP from the causaldag package (Chandler Squires, 2018), and the implementation of CD-NOD from the causal-learn package (Zheng et al., 2023). We use the Fisher Z test to examine conditional relations.
Experiment Setup Yes Specifically, we begin by randomly sampling Erdös Rényi (Erdös & Rényi, 1959) graphs with an average degree of 2 as the ground truth DAG for {X i }D i=1...We then simulate linear SEMs for {X i }D i=1 with exogenous noise terms {ϵ i }D i=1...Here, the linear edge coefficients are sampled from Unif([ 2, 0.5] [0.5, 2]), and the variances of exogenous noise terms are sampled from Unif[1, 4]. Finally, using the exogenous noise terms {ϵ i }D i=1 of the selected samples, we simulate interventional data over {Xi}D i=1 under different targets. We simulate a total of D/2 interventions, each with one variable being intervened on...The significance level is set to 0.05 for 5 and 10 variables, and 0.01 for 15 and 20 variables.