Data-Driven Selection of Instrumental Variables for Additive Nonlinear, Constant Effects Models
Authors: Xichen Guo, Feng Xie, Yan Zeng, Hao Zhang, Zhi Geng
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on both synthetic and two real-world datasets demonstrate the effectiveness and robustness of our proposed approach, highlighting its potential for broader applications in causal analysis. |
| Researcher Affiliation | Academia | 1 Department of Applied Statistics, Beijing Technology and Business University, Beijing, China 2 SIAT, Chinese Academy of Sciences, Shenzhen, China. Correspondence to: Feng Xie <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 CAT |
| Open Source Code | Yes | The source code is available in the Supplementary Material. |
| Open Datasets | Yes | Colonial Origins Data (Acemoglu et al., 2001). Children and Mothers Labor Supply Data (Angrist & Evans, 1996). |
| Dataset Splits | No | The paper describes using synthetic data with specified sample sizes (1k, 3k, 5k) and for real-world data, mentions randomly selecting 5% of the data for testing with averages over 10 repeated tests for one dataset, but it does not specify explicit train/test/validation splits for model training and evaluation. |
| Hardware Specification | Yes | All experiments were performed with Intel 2.90 GHz and 2.89 GHz CPUs and 128 GB of memory. |
| Software Dependencies | No | The paper mentions several R packages (Robust IV, CIIV, sisVIVE) and their availability for *comparison methods* (TSHT, CIIV, sis VIVE, MR-Egger) but does not provide specific version numbers for the software used to implement the proposed CAT algorithm. |
| Experiment Setup | No | The paper describes data generation mechanisms for synthetic data (including noise distributions and coefficient ranges) and mentions some details about real-world data processing (e.g., sample selection, variable definitions). However, it does not explicitly detail hyperparameters, optimizer settings, or other system-level training configurations for their proposed algorithm. |