reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DoubleML - An Object-Oriented Implementation of Double Machine Learning in Python

Authors: Philipp Bach, Victor Chernozhukov, Malte S. Kurz, Martin Spindler

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To illustrate this eﬀect, we simulate data from a PLR model. A naive ML approach consists of estimating g0 with ML methods, for example using random forests, and then plugging-in predictions ˆg0 to eventually obtain a naive estimate of θ0 from an OLS regression of Equation (1). The arising bias is substantial as illustrated in Figure 1a. As an alternative, we can partial out the eﬀect of X on Y and X on D by estimating ˆg0 and ˆm0 with ML methods. θ0 can then be estimated from an OLS regression of Y ˆg0(X) on D ˆm0(X). This approach implements a Neyman orthogonal score function that identiﬁes θ0. As shown in Figure 1c, the corresponding estimator is robust to the regularization bias. (...) Figure 2 provides a summary of the object-oriented structure and a code snippet demonstrating the API of the the Double ML package. (...) coef std err t P>\|t\| 2.5 % 97.5 % d 0.5161 0.0750 6.8805 0.0000 0.3691 0.6631
Researcher Affiliation	Academia	Philipp Bach EMAIL Victor Chernozhukov EMAIL Malte S. Kurz EMAIL Martin Spindler EMAIL Faculty of Business Administration, University of Hamburg, Moorweidenstraße 18, 20148 Hamburg, Germany Department of Economics and Center for Statistics and Data Science, Massachussets Institute of Technology, 50 Memorial Drive, Cambridge, MA 02142, USA
Pseudocode	No	The paper includes a code snippet in Figure 2, but it is an actual Python code example demonstrating the API rather than a pseudocode block or algorithm.
Open Source Code	Yes	Double ML is an open-source Python library implementing the double machine learning framework of Chernozhukov et al. (2018) for a variety of causal models. (...) Source code, documentation and an extensive user guide can be found at https://github.com/Double ML/doubleml-for-py and https://docs.doubleml.org.
Open Datasets	No	The paper mentions simulating data to illustrate effects: "To illustrate this eﬀect, we simulate data from a PLR model." and "df = make_irm_data(return_type='Data Frame', n_obs=1000, theta=0.5)" in the code snippet. It does not use any pre-existing publicly available datasets.
Dataset Splits	No	The paper discusses 'sample splitting' as a key ingredient of the DML framework, stating "Sample splitting in K folds is applicable and the usage of repeated cross-ﬁtting is recommended to obtain more eﬃcient estimates." This describes the general methodology but does not provide specific split information (percentages, counts, or references to predefined splits) for any experiment conducted in the paper. The synthetic data generation does not specify splits.
Hardware Specification	No	The paper does not explicitly describe any specific hardware (e.g., GPU/CPU models, memory) used for running experiments or developing the software.
Software Dependencies	No	The package is distributed under the MIT license and relies on core libraries from the scientiﬁc Python ecosystem: scikit-learn, numpy, pandas, scipy, statsmodels and joblib. The paper lists software dependencies but does not provide specific version numbers for them (e.g., 'scikit-learn 0.24.1').
Experiment Setup	Yes	dml_model = Double MLIRM(dml_data, Random Forest Regressor(max_depth=5), Random Forest Classiﬁer(max_depth=5), score='ATE')