reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multi-Source Causal Inference Using Control Variates under Outcome Selection Bias

Authors: Wenshuo Guo, Serena Lutong Wang, Peng Ding, Yixin Wang, Michael Jordan

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Across simulations and two case studies with real data, we show that the control variate-based ATE estimator has consistently and significantly reduced variance against different baselines.
Researcher Affiliation	Academia	Wenshuo Guo EMAIL Department of Electrical Engineering and Computer Sciences University of California, Berkeley Serena Wang EMAIL Department of Electrical Engineering and Computer Sciences University of California, Berkeley Peng Ding EMAIL Department of Statistics University of California, Berkeley Yixin Wang EMAIL Department of Statistics University of Michigan Michael I. Jordan EMAIL Department of Electrical Engineering and Computer Sciences, and Department of Statistics University of California, Berkeley
Pseudocode	No	The paper describes algorithms conceptually (e.g., "We propose an algorithm to estimate causal effects...") and refers to algorithms by others, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block for its own methodology.
Open Source Code	No	All code will be made publicly available.
Open Datasets	Yes	Case study 1: flu shot encouragement with selection bias from case-control studies. We consider a flu shot encouragement experiment dataset that has been repeatedly studied in the causal inference literature (Mc Donald et al., 1992; Hirano et al., 2000; Ding & Lu, 2017). Case study 2: spam email detection with selection bias from implicit feedback. For a second case study, we use a dataset constructed for the Atlantic Causal Inference Conference (ACIC) 2019 Data Challenge based on the Spambase dataset for spam email detection from UCI (Gruber et al., 2019; Dua & Graff, 2017).
Dataset Splits	No	The paper describes generating datasets O1 and O2 based on certain processes and sizes (e.g., n1 = 30,000, n2 = 3,000 or 10,000), but does not specify how these datasets are further split into training, validation, or test sets beyond their initial generation.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running its experiments.
Software Dependencies	No	The paper does not specify any particular software names with version numbers (e.g., Python 3.8, PyTorch 1.9) that were used to implement the experiments.
Experiment Setup	No	The paper mentions using logistic models and neural networks, running bootstrap replicates (B=100 or B=300), and using five-fold cross validation for neural network architecture selection (with details in Appendix E). However, it does not explicitly provide specific hyperparameter values (e.g., learning rate, batch size, epochs) in the main text for these models.