reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Differentially private learners for heterogeneous treatment effects

Authors: Maresa Schröder, Valentyn Melnychuk, Stefan Feuerriegel

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate our DP-CATE across various experiments using synthetic and real-world datasets. To the best of our knowledge, we are the first to provide a framework for CATE estimation that is Neyman-orthogonal and differentially private.
Researcher Affiliation	Academia	Maresa Schr oder, Valentyn Melnychuk & Stefan Feuerriegel LMU Munich Munich Center for Machine Learning (MCML) EMAIL
Pseudocode	Yes	We present the pseudo-code for DP-CATE in Supplement C. Supplement C, Algorithm 1: Pseudo-code of out DP-CATE for functions. Algorithm 2: Pseudo-code of DP-CATE for functions (iterative setting).
Open Source Code	Yes	1The source code is available at our Git Hub repository. Our experiments are implemented in Python. We provide our code in our Git Hub repository: https://github.com/m-schroder/DP-CATE.
Open Datasets	Yes	We demonstrate our DP-CATE across various experiments using synthetic and real-world datasets. To the best of our knowledge, we are the first to provide a framework for CATE estimation that is Neyman-orthogonal and differentially private. We demonstrate the applicability of DP-CATE to medical datasets by using the MIMIC-III dataset (Johnson et al., 2016) and the TCGA dataset (Weinstein et al., 2013).
Dataset Splits	Yes	For each setting, we draw 3000 samples, which we split into train (90%) and test (10%) sets. Our final dataset contains 14719 samples, which we split into train (90%) and test (10%) sets.
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments.
Software Dependencies	No	Our experiments are implemented in Python. For the outcome and the propensity estimation, we always employ a multilayer perceptron regression and classification model, respectively. The models consisted of one layer of width 32 with Re Lu activation function and were optimized via Adam. However, specific version numbers for Python or any libraries (e.g., PyTorch, TensorFlow) are not provided.
Experiment Setup	Yes	The models consist of one layer of width 32 with Re Lu activation function and were optimized via Adam at a learning rate of 0.01 and batch size 128. For our experiments with the finite-query DP-CATE, we implement the pseudo-outcome regression in the second stage as (a) a kernel ridge regression model with a Gaussian kernel and default parameter specifications (KR) and (b) a neural network (NN) with two hidden layers of width 32 with tanh activation function trained in the same manner as the nuisance models. In the experiments for our functional DP-CATE, we employ a Gaussian kernel ridge regression with m = 50 basis functions and default regularization parameter λ = 1.