reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

NOFLITE: Learning to Predict Individual Treatment Effect Distributions

Authors: Toon Vanderschueren, Jeroen Berrevoets, Wouter Verbeke

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Empirical results We compare the different models for the IHDP, EDU, and News data sets in Table 2. In terms of loglikelihood, NOFLITE obtains the best performance out of all methods under consideration for each data set. These findings demonstrate NOFLITE s ability to learn accurate individual treatment effect distributions from a variety of observational data sets and associated data generating processes.
Researcher Affiliation	Academia	Toon Vanderschueren EMAIL KU Leuven University of Antwerp Jeroen Berrevoets EMAIL University of Cambridge Wouter Verbeke EMAIL KU Leuven
Pseudocode	No	The paper describes the NOFLITE architecture, optimization, and inference steps in prose and mathematical formulas, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	All code is available at https://github.com/toonvds/NOFLITE.
Open Datasets	Yes	IHDP. The Infant Health and Development Program (IHDP; Hill, 2011) is a semi-synthetic data set that is commonly used to evaluate machine learning models for causal inference. ... We use the 100 replications from github.com/clinicalml/cfrnet (Shalit et al., 2017). EDU. The Education data set (EDU; Zhou et al., 2022) measures the effect of providing a mother with adult education benefits on their children s learning. News. The News data set (Johansson et al., 2016) shows the effect of reading an article on either mobile or desktop (t) on the reader s experience (y), based on the article s content in word counts (x).
Dataset Splits	No	The paper mentions using semi-synthetic datasets like IHDP, EDU, and News, and refers to 100 replications for IHDP. However, it does not explicitly provide details about the specific training, validation, and test splits (e.g., percentages, sample counts, or methodology) used for the experiments within the main text.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper mentions the use of the Adam optimizer, exponential linear units (ELU), and wandb for hyperparameter tuning. However, it does not provide specific version numbers for these or other key software components or libraries (e.g., Python, PyTorch, TensorFlow) that would be needed to replicate the experiments.
Experiment Setup	Yes	We add more information on the chosen hyperparameters in Table 3. ... Table 3: Hyperparameter tuning. We show the optimal hyperparameters for the different data sets. ... Training is done using gradient descent with the Adam optimizer (Kingma & Ba, 2015).