Automated Detection of Causal Inference Opportunities: Regression Discontinuity Subgroup Discovery
Authors: Tony Liu, Patrick Lawlor, Lyle Ungar, Konrad Kording, Rahul Ladhania
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show the utility of our approach through both synthetic experiments (Section 5) and a case study using a medical claims dataset consisting of over 60 million patients (Section 6). |
| Researcher Affiliation | Collaboration | Tony Liu EMAIL University of Pennsylvania, Roblox |
| Pseudocode | Yes | Algorithm 1: RD Sub Group Discovery (RDSGD) |
| Open Source Code | Yes | Source code and the data needed to reproduce all figures are available at: https://github.com/tliu526/rdsgd. |
| Open Datasets | Yes | Source code and the data needed to reproduce all figures are available at: https://github.com/tliu526/rdsgd. we do however provide descriptive statistics of the presented in Table D.1 as well as anonymized datasets that are sufficient to recreate all figures in this paper. |
| Dataset Splits | Yes | We split our data into equally sized samples S1, S2 for each clinical context. |
| Hardware Specification | Yes | All simulations were run on a Ubuntu 20.04 LTS server, with a 24-core Intel i9-7920X CPU and 94 GB RAM. Claims data analyses were run on a secure Cent OS Linux 7 server with a 40-core Intel Xeon E54650 CPU and 504 GB RAM. |
| Software Dependencies | No | The paper mentions "Causal forests were fit according to default parameters specified in the Econ ML package (Battocchi et al., 2019)" and "Logistic Regression CV scikit-learn models" but does not specify version numbers for these software components. |
| Experiment Setup | Yes | Causal forests were fit according to default parameters specified in the Econ ML package (Battocchi et al., 2019) (with honesty enabled for valid and unbiased inference), and a fixed depth of 3 and minimum leaf size of 100 were used for subsequent CATE causal trees distilled from the forests to ensure subgroups remained interpretable. The causal forest implementation in Econ ML by default runs a two-fold cross validation internally when selecting hyperparameters for the Logistic Regression CV scikit-learn models for treatment, which searches over L2 regularization parameters in a grid of 10 values between 1e 4 and 1e4 using the default accuracy criterion. |