Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Doubly Robust Kernel Statistics for Testing Distributional Treatment Effects
Authors: Jake Fawkes, Robert Hu, Robin J. Evans, Dino Sejdinovic
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we experimentally and theoretically demonstrate the validity of these tests. ... 3. We experimentally validate the performance of our test on synthetic, semisynthetic and real data. ... 5 Experiments |
| Researcher Affiliation | Collaboration | Jake Fawkes EMAIL Department of Statistics University of Oxford Robert Hu EMAIL Amazon Robin J. Evans EMAIL Department of Statistics University of Oxford Dino Sejdinovic EMAIL School of Mathematical Sciences University of Adelaide |
| Pseudocode | No | The paper contains mathematical derivations and descriptions of algorithms, but no explicitly labeled 'Pseudocode' or 'Algorithm' block with structured code-like steps. |
| Open Source Code | Yes | An implementation of our approach can be found at: https://github.com/Jakefawkes/DR_distributional_test. |
| Open Datasets | Yes | We evaluate on two standard semi-synthetic tasks, the infant health and development program (IDHP) introduced in Hill (2011), the linked births and deaths data (LBIDD) (Shimoni et al., 2018). |
| Dataset Splits | No | The paper mentions 'randomly split into train/test sets, DTr, DTe' but does not specify exact percentages or sample counts for these splits. It also mentions 'We run these experiments with 2000 data points' for simulated data, but this is a total number, not a split. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with their version numbers (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | For both settings we fit a linear logistic regression for the propensity score so that the model is incorrectly specified. ... We run these experiments with 2000 data points, rejecting at the 0.05 significance level. ... The matching for all statistics is done via logistic regression and we apply the permutation from Section 4. ... We again use logistic regression matching and weights model. |