reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Doubly Robust Kernel Statistics for Testing Distributional Treatment Effects

Authors: Jake Fawkes, Robert Hu, Robin J. Evans, Dino Sejdinovic

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we experimentally and theoretically demonstrate the validity of these tests. ... 3. We experimentally validate the performance of our test on synthetic, semisynthetic and real data. ... 5 Experiments
Researcher Affiliation	Collaboration	Jake Fawkes EMAIL Department of Statistics University of Oxford Robert Hu EMAIL Amazon Robin J. Evans EMAIL Department of Statistics University of Oxford Dino Sejdinovic EMAIL School of Mathematical Sciences University of Adelaide
Pseudocode	No	The paper contains mathematical derivations and descriptions of algorithms, but no explicitly labeled 'Pseudocode' or 'Algorithm' block with structured code-like steps.
Open Source Code	Yes	An implementation of our approach can be found at: https://github.com/Jakefawkes/DR_distributional_test.
Open Datasets	Yes	We evaluate on two standard semi-synthetic tasks, the infant health and development program (IDHP) introduced in Hill (2011), the linked births and deaths data (LBIDD) (Shimoni et al., 2018).
Dataset Splits	No	The paper mentions 'randomly split into train/test sets, DTr, DTe' but does not specify exact percentages or sample counts for these splits. It also mentions 'We run these experiments with 2000 data points' for simulated data, but this is a total number, not a split.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with their version numbers (e.g., programming languages, libraries, or frameworks).
Experiment Setup	Yes	For both settings we fit a linear logistic regression for the propensity score so that the model is incorrectly specified. ... We run these experiments with 2000 data points, rejecting at the 0.05 significance level. ... The matching for all statistics is done via logistic regression and we apply the permutation from Section 4. ... We again use logistic regression matching and weights model.