reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Localized Debiased Machine Learning: Efficient Inference on Quantile Treatment Effects and Beyond

Authors: Nathan Kallus, Xiaojie Mao, Masatoshi Uehara

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We ﬁrst study the behavior of LDML in a simulation study. We then demonstrate its use in estimating the QTE of 401(k) eligibility on net ﬁnancial assets, and the LQTE of 401(k) participation using eligibility as IV.
Researcher Affiliation	Academia	Nathan Kallus EMAIL Cornell Tech Cornell University 2 West Loop Rd, NY 10044, USA; Xiaojie Mao EMAIL School of Economics and Management Tsinghua University Beijing, 100084, China; Masatoshi Uehara EMAIL Cornell Tech Cornell University 2 West Loop Rd, NY 10044, USA
Pseudocode	No	The paper describes the 'LDML Meta-Algorithm' in Section 2.2 and 'Definition 1 (3-way-cross-fold nuisance estimation)' as structured steps for its methodology. However, these are presented as definitions and descriptions rather than explicitly labeled 'Pseudocode' or 'Algorithm' blocks with formal code-like formatting.
Open Source Code	Yes	Replication code is available at https: //github.com/Causal ML/Localized Debiased Machine Learning.
Open Datasets	Yes	We use data from Chernozhukov and Hansen (2004) to estimate the QTEs of 401(k) retirement plan eligibility on net ﬁnancial assets (N = 9915).
Dataset Splits	Yes	Randomly permute the data indices and let Dk = { (k 1)N/K +1, . . . , k N/K }, k = 1, . . . , K be a random even K-fold split of the data.
Hardware Specification	No	The paper mentions using R packages for boosting, LASSO, and neural networks but does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running the experiments.
Software Dependencies	No	We consider estimating both propensity score η 2 and conditional cumulative distribution η 1 with each of: boosting (using R package gbm), LASSO (using R package hdm), and a one-hidden-layer neural network (using R package nnet).
Experiment Setup	Yes	We consider estimating θ 1 using ﬁve diﬀerent methods. First, we consider LDML applied to the eﬃcient estimating equation (Eq. (3)) with K = 5, K = 2, ˆθ(k) 1,init estimated using 2-fold cross-ﬁtted IPW with random-forest-estimated propensities... In each instantiation of LDML, we construct folds so to ensure a balanced distribution of treated and untreated units, we let K = (K 1)/2, we use the IPW initial estimator for ˆθ1,init, we normalize propensity weights to have mean 1 within each treatment group, we use estimates given by solving the grand-average estimating equation as in Deﬁnition 2, and for variance estimation we estimate J using IPW kernel density estimation as in Remark 4. The solution to the LDML-estimated empirical estimating equation must occur at an observed outcome Yi and that we can ﬁnd the solution using binary search after sorting the data along outcomes.