Localized Debiased Machine Learning: Efficient Inference on Quantile Treatment Effects and Beyond

Authors: Nathan Kallus, Xiaojie Mao, Masatoshi Uehara

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first study the behavior of LDML in a simulation study. We then demonstrate its use in estimating the QTE of 401(k) eligibility on net financial assets, and the LQTE of 401(k) participation using eligibility as IV.
Researcher Affiliation Academia Nathan Kallus EMAIL Cornell Tech Cornell University 2 West Loop Rd, NY 10044, USA; Xiaojie Mao EMAIL School of Economics and Management Tsinghua University Beijing, 100084, China; Masatoshi Uehara EMAIL Cornell Tech Cornell University 2 West Loop Rd, NY 10044, USA
Pseudocode No The paper describes the 'LDML Meta-Algorithm' in Section 2.2 and 'Definition 1 (3-way-cross-fold nuisance estimation)' as structured steps for its methodology. However, these are presented as definitions and descriptions rather than explicitly labeled 'Pseudocode' or 'Algorithm' blocks with formal code-like formatting.
Open Source Code Yes Replication code is available at https: //github.com/Causal ML/Localized Debiased Machine Learning.
Open Datasets Yes We use data from Chernozhukov and Hansen (2004) to estimate the QTEs of 401(k) retirement plan eligibility on net financial assets (N = 9915).
Dataset Splits Yes Randomly permute the data indices and let Dk = { (k 1)N/K +1, . . . , k N/K }, k = 1, . . . , K be a random even K-fold split of the data.
Hardware Specification No The paper mentions using R packages for boosting, LASSO, and neural networks but does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running the experiments.
Software Dependencies No We consider estimating both propensity score η 2 and conditional cumulative distribution η 1 with each of: boosting (using R package gbm), LASSO (using R package hdm), and a one-hidden-layer neural network (using R package nnet).
Experiment Setup Yes We consider estimating θ 1 using five different methods. First, we consider LDML applied to the efficient estimating equation (Eq. (3)) with K = 5, K = 2, ˆθ(k) 1,init estimated using 2-fold cross-fitted IPW with random-forest-estimated propensities... In each instantiation of LDML, we construct folds so to ensure a balanced distribution of treated and untreated units, we let K = (K 1)/2, we use the IPW initial estimator for ˆθ1,init, we normalize propensity weights to have mean 1 within each treatment group, we use estimates given by solving the grand-average estimating equation as in Definition 2, and for variance estimation we estimate J using IPW kernel density estimation as in Remark 4. The solution to the LDML-estimated empirical estimating equation must occur at an observed outcome Yi and that we can find the solution using binary search after sorting the data along outcomes.