reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Estimating Potential Outcome Distributions with Collaborating Causal Networks

Authors: Tianhui Zhou, William E Carson IV, David Carlson

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we evaluate the performance of CCN in multiple experiments on both synthetic and semi-synthetic data. We demonstrate that CCN learns improved distribution estimates compared to existing Bayesian and deep generative methods as well as improved decisions with respects to a variety of utility functions. Section 5: Experiments. Table 1: Quantitative results on IHDP (Section 5.2). Table 2: Quantitative results on the EDU dataset (Section 5.3). Table 3: The estimated LL under different simulated distributions (Section 5.4.2). Figure 6: Convergence rate of models as a function of sample size (Section 5.4.3). Section 5.4.4: Ablation Study.
Researcher Affiliation	Academia	Tianhui Zhou EMAIL Department of Biostatistics and Bioinformatics Duke University Durham, NC 27705, U.S. William E Carson IV EMAIL Department of Biomedical Engineering Duke University Durham, NC 27705, U.S. David Carlson EMAIL Department of Civil and Environmental Engineering Department of Biostatistics and Bioinformatics Department of Computer Science Department of Electrical and Computer Engineering Duke University Durham, NC 27705, U.S.
Pseudocode	No	The paper describes methods and formulations (e.g., g-loss, full loss L) using mathematical equations and descriptive text, but it does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	Yes	Model and experiment code is available at https://github.com/carlson-lab/collaborating-causal-networks. The code for CCN and its adjustment will be public on Git Hub under the MIT license if the manuscript is accepted.
Open Datasets	Yes	First, we evaluate causal methods using the Infant Health and Development Program (IHDP) dataset (Hill, 2011)... Simulated replicates for the IHDP data were downloaded directly from https://github.com/clinicalml/ cfrnet... The raw education data corresponding to the EDU dataset were downloaded from the Harvard Dataverse5, which consist of 33,167 observations and 378 variables. The dataset does not contain personally identifiable information or offensive content. 5https://dataverse.harvard.edu/dataset.xhtml?persistent Id=doi:10.7910/DVN/19PPE7
Dataset Splits	Yes	We use 100 replications of the data for out-of-sample evaluation by following the simulation process of Shalit et al. (2017). (IHDP dataset) ... We keep 1,000 samples for evaluation and use the rest for training. The full procedure is repeated 10 times for variability assessment. (EDU dataset) ... The variability assessment is based on 5-fold cross validation. (Additional Distribution Tests) ... We simulate 40,000 samples in total and hold out 2,000 for evaluation based on log likelihood (LL) with the following procedures: ... with 8/2 split for training and testing. (Ablation Study)
Hardware Specification	Yes	The models CCN, CEVAE and GANITE were trained and evaluated on the various datasets described in this work on a machine with a single NVIDIA P100 GPU. The R-based methods BART, CF, and GAMLSS were trained and evaluated on a machine with an Intel(R) Xeon(R) Gold 6154 CPU. The CMGP model was trained and evaluated on a machine with an Intel Core i7 10th generation processor and a NVIDIA Ge Force RTX 3090 GPU.
Software Dependencies	No	The paper mentions specific R packages like 'Bayes Tree', 'grf', 'gamlss', and Python packages like 'cmgp', and refers to codebases for 'CEVAE_pytorch' and 'GANITE'. However, it does not provide specific version numbers for the programming languages (e.g., Python, R) or major deep learning frameworks (e.g., PyTorch) used to implement these methods or run the experiments. While the 'gamlss' package citation includes 'R package version 5.3-4', this is the only specific version mentioned for a dependency, not a comprehensive list.
Experiment Setup	Yes	In FCCN, we introduce two latent representation ϕA( ) and ϕW ( ). We set their dimensions to 25. They are both parameterized through a neural network with a single hidden layer of 100 units. The Wasserstein distance in Wass-loss is learned through D( ) which is a network with two hidden layers of 100 and 60 units per layer. We adopt the weight clipping strategy with threshold: (-0.01, 0.01)... We propose a few candidate values for α and β as: 5e-3, 1e-3, 5e-4, 1e-4, 5e-5, 1e-5... we fix α = 5e-4 and β = 1e-5 in IHDP experiments and α = 1e-5 and β = 5e-3 in EDU experiments. For BART, we set the burn-in iteration to 1,000. For CEVAE, the latent confounder size is 20. The optimizer is based on ADAM... learning rate is set to 1e-4, and the decay rate to 1e-3 after tuning. For GANITE, the hyperparameters for the supervised loss are set to α = 2 (counterfactual block) and β = 1e-3 (ITE block) after tuning. For CMGP, we use a maximum iteration count (max_gp_iterations parameter) of 100.