reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Meta-learner for Heterogeneous Effects in Difference-in-Differences

Authors: Hui Lan, Haoge Chang, Eleanor Wiske Dillon, Vasilis Syrgkanis

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results demonstrate the superiority of our approach over existing baselines. [...] We demonstrate using synthetic and semi-synthetic experiments that the proposed meta-learner outperforms prior baselines. Finally, we applied our method on a real-world case study on the effects of raising minimum wage on teen employment. Our flexible doubly robust meta-learner automatically identified dimensions and patterns of heterogeneity that had not been highlighted in prior literature.
Researcher Affiliation	Collaboration	1Institute of Computational and Mathematical Engineering, Stanford University, Stanford, USA 2Department of Economics, Columbia University 3Microsoft Research, New England 4Department of Management Science and Engineering, Stanford University, Stanford, USA. Correspondence to: Hui Lan <EMAIL>.
Pseudocode	No	The paper describes methods using mathematical formulations and prose, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The code for this work will be made available upon publication.
Open Datasets	Yes	We applied our proposed approach to the minimum wage dataset that is also studied in Callaway & Sant Anna, 2021 and Callaway, 2023.
Dataset Splits	No	Section 6.1 mentions evaluating on a "held out test set" and section 6.2 on "held out validation set" and "test set", implying data splitting. However, specific percentages, absolute counts, or detailed methodologies for these splits are not explicitly provided in the text to reproduce the data partitioning.
Hardware Specification	No	The paper does not explicitly mention any specific hardware used for running its experiments, such as GPU/CPU models, memory, or cloud computing specifications.
Software Dependencies	No	The paper mentions using "neural net, XGBoost, and linear models" and "logistic regression" but does not provide specific version numbers for any software libraries or frameworks used.
Experiment Setup	Yes	We considered three different final-stage models for the CATT: neural net, XGBoost, and linear models, to fit the meta-learners. The propensity function, i.e. P(D = 1\|W), is always fitted using logistic regression. The data has 20 covariates, and the CATT learners look at the projection onto 5 covariates. ... The probability of receiving treatment is generated from the logitistic transformation of a linear transformation of a linear function of 2 region variables that are binary, and the log average payment information for year 2001 (i.e. 2 (region 3) 2 (region 4) +((log average pay) 10)). The time trends, i.e.Ypost(0) Ypre(0), is generated by 0.1 (log average pay) + 0.1 (region 3) + 0.1 (years after treatment) + (region 4) (years after treatment)2 + (log average pay) 1 2 (log average population). The treatment effect is defined as 0.1 (log average population) + 0.1 (log average population) 1 2.