Data-Driven Selection of Instrumental Variables for Additive Nonlinear, Constant Effects Models

Authors: Xichen Guo, Feng Xie, Yan Zeng, Hao Zhang, Zhi Geng

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on both synthetic and two real-world datasets demonstrate the effectiveness and robustness of our proposed approach, highlighting its potential for broader applications in causal analysis.
Researcher Affiliation Academia 1 Department of Applied Statistics, Beijing Technology and Business University, Beijing, China 2 SIAT, Chinese Academy of Sciences, Shenzhen, China. Correspondence to: Feng Xie <EMAIL>.
Pseudocode Yes Algorithm 1 CAT
Open Source Code Yes The source code is available in the Supplementary Material.
Open Datasets Yes Colonial Origins Data (Acemoglu et al., 2001). Children and Mothers Labor Supply Data (Angrist & Evans, 1996).
Dataset Splits No The paper describes using synthetic data with specified sample sizes (1k, 3k, 5k) and for real-world data, mentions randomly selecting 5% of the data for testing with averages over 10 repeated tests for one dataset, but it does not specify explicit train/test/validation splits for model training and evaluation.
Hardware Specification Yes All experiments were performed with Intel 2.90 GHz and 2.89 GHz CPUs and 128 GB of memory.
Software Dependencies No The paper mentions several R packages (Robust IV, CIIV, sisVIVE) and their availability for *comparison methods* (TSHT, CIIV, sis VIVE, MR-Egger) but does not provide specific version numbers for the software used to implement the proposed CAT algorithm.
Experiment Setup No The paper describes data generation mechanisms for synthetic data (including noise distributions and coefficient ranges) and mentions some details about real-world data processing (e.g., sample selection, variable definitions). However, it does not explicitly detail hyperparameters, optimizer settings, or other system-level training configurations for their proposed algorithm.