Latent Variable Causal Discovery under Selection Bias
Authors: Haoyue Dai, Yiwen Qiu, Ignavier Ng, Xinshuai Dong, Peter Spirtes, Kun Zhang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Simulations and real-world experiments confirm the effectiveness of using our rank constraints. [...] We conduct empirical studies on synthetic data to evaluate our method against existing ones. [...] We conduct experiments on synthetic and real-world data, showing that our method effectively recovers the causal structure under selection. |
| Researcher Affiliation | Academia | 1Carnegie Mellon University 2Mohamed bin Zayed University of Artificial Intelligence. |
| Pseudocode | No | The paper describes methods and proofs in prose and mathematical notation but does not contain a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | An implementation of our method is available at https: //github.com/Mark Dana/Latent-Selection. |
| Open Datasets | Yes | We examine the (1) World Value Survey2 (WVS) dataset and the (2) Big Five Personal-ity3 (BIG5) dataset. [...] https://www.worldvaluessurvey.org/ WVSDocumentation WV7.jsp [...] https://openpsychometrics.org/ |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. It mentions selection criteria for synthetic data ("retain only the samples where these selection variables fall within the 40th to 60th percentile of their values") and data collection details for real-world data ("Data are collected across different nations, with varying sizes (500 4000 samples)") but no standard data partitioning for reproduction. |
| Hardware Specification | Yes | All experiments are from 5 random runs with 2 CPUs and 16 GB of memory. |
| Software Dependencies | No | The paper does not explicitly mention specific software dependencies with version numbers. |
| Experiment Setup | Yes | We first generate a random Erdös Rényi graph (Erdös & Rényi, 1959) among the n {5, 10, 15, 20} latent variables, with an average degree of 2. Each latent variable has either 2 or 3 observed variables as children. The linear coefficients of the edges are sampled uniformly at random from [ 2, 0.5] [0.5, 2]. [...] Here, we consider both Gaussian and exponential distributions for the error terms. |