reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MALTS: Matching After Learning to Stretch

Authors: Harsh Parikh, Cynthia Rudin, Alexander Volfovsky

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We tested MALTS against several other matching methods in simulation studies (Section 6), where ground truth CATEs are known. In these experiments, MALTS consistently achieves substantially better results than other matching methods including Genmatch, propensity score matching, and prognostic score matching for estimating CATEs. Even though our method is heavily constrained to produce interpretable matches, it performs at the same level as non-matching methods that are designed to ﬁt extremely ﬂexible but uninterpretable models directly to the response surface. In Section 3, we introduce the learning-to-match framework and show that under a choice of smooth distance metric (Deﬁnition 1) we can estimate conditional average treatment eﬀects accurately with high probability. Section 4 discusses MALTS optimization set up and training procedure that learns a smooth distance metric. In Section 5, we prove that the distance metric learned by MALTS is multi-robust (Deﬁnition 3) and generalizable (Deﬁnition 5). Thus, the distance metric estimated by MALTS framework facilitates the correct estimates of CATEs under SUTVA and positivity assumptions.
Researcher Affiliation	Academia	Harsh Parikh EMAIL Department of Computer Science Duke University Durham, NC 27708-0129, USA. Cynthia Rudin EMAIL Department of Computer Science Duke University Durham, NC 27708-0129, USA. Alexander Volfovsky EMAIL Department of Statistical Science Duke University Durham, NC 27710, USA.
Pseudocode	No	The paper includes a "Figure 1: Schematic drawing of MALTS algorithm" which visually depicts the steps of the algorithm. However, it does not provide a formal pseudocode block or algorithm steps formatted as code within the text.
Open Source Code	No	The paper does not provide explicit access to source code for the methodology described. It mentions previous uses and extensions of MALTS in other works but does not offer a link or an affirmative statement of code release for the current paper's method.
Open Datasets	Yes	The La Londe data pertain to the National Support Work Demonstration (NSW) temporary employment program and its eﬀect on income level of the participants (La Londe 1986). This dataset is frequently used as a benchmark for the performance of methods for observational causal inference.
Dataset Splits	Yes	MALTS performs an η-fold honest causal inference procedure with the estimator φ inside each matched group being linear regression. We split the observed samples Sn into η equal parts such that the ratio of treated to control units in each part is similar. For each fold, we use one of the η partitions as the training set Str (not used for matching) and the rest of the η 1 partitions as the estimation set Sest. Using the output from each of the η folds, we calculate the estimated CATE for each unit (averaged across folds), estimated distance metric (averaged across folds) and a weighted uniﬁed matched group for each unit si Sn. The weight of each matched unit sk corresponds to the number of times a particular unit sk was in the matched group of unit si across the η 1 constructed matched groups. Here, η was chosen to be 5 in our experiments.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments.
Software Dependencies	No	We used Match It s implementation of genmatch and propensity score matching as it is commonly used by empiricists (Ho et al. 2011). ... We used the causal forest algorithm as implemented in the grf package in R. ... We used Vincent Dorie s R implementation of BART (Dorie et al. 2019). ... Lastly, we implemented 5-fold prognostic score matching using a random forest approach to model the prognostic score function. The paper mentions various software packages and their implementations (e.g., Match It, grf package in R, BART in R), but it does not specify version numbers for these components or for the R environment itself.
Experiment Setup	Yes	MALTS has four main hyperparameters: 1) K, which is the number of nearest neighbors used to estimate the counterfactual, which can be chosen by cross-validation. 2) n, the size of training set, i.e., the size of the split on the left of Figure 1. This can be chosen based on the amount of data relative to the number of features, though typically we choose it to be 10% of the data. 3) The maximum allowed diameter or caliper to prune bad matched groups. ... 4) The number of repeats refers to the number of times we shuﬄe the data and re-partition it for MALTS training and estimation procedure. A larger number of repeats of the whole process helps with smoothing out the estimates over diﬀerent train/test splits. ... Here, η was chosen to be 5 in our experiments. ... We used the causal forest algorithm as implemented in the grf package in R. The settings for causal forest were set to the default designed by the grf developer with number of trees equal to 2000 and p + 20 variables tried for each split.