reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Exploring Transformer Backbones for Heterogeneous Treatment Effect Estimation

Authors: YiFan Zhang, Hanlin Zhang, Zachary Chase Lipton, Li Erran Li, Eric Xing

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show empirically that Trans TEE can: (1) serve as a general-purpose treatment effect estimator which significantly outperforms competitive baselines on a variety of challenging TEE problems... Moreover, comprehensive experiments on six benchmarks with four types of treatments are conducted to verify the effectiveness of Trans TEE. We introduce a new surrogate modeling task to broaden the scope of TEE beyond semi-synthetic evaluation and show that Trans TEE is effective in real-world applications like auditing fair predictions of LMs.
Researcher Affiliation	Collaboration	Yi-Fan Zhang EMAIL Institute of Automation University of Chinese Academy of Sciences Hanlin Zhang Harvard University Zachary C. Lipton Carnegie Mellon University Li Erran Li Eric P. Xing Carnegie Mellon University Equal Contribution. Work done outside Amazon.
Pseudocode	No	The paper describes methods through text and mathematical equations, and illustrates workflow with diagrams (e.g., Figure 3). However, there are no explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	No	The paper states: "All the assets (i.e., datasets and the codes for baselines) we use include a MIT license containing a copyright notice and this permission notice shall be included in all copies or substantial portions of the software." This refers to codes for baselines and datasets, not necessarily the authors' own implementation code for Trans TEE. No explicit statement or link for their own code is provided.
Open Datasets	Yes	For continuous treatments, we use one synthetic dataset and two semi-synthetic datasets: the IHDP and News datasets. For treatment with continuous dosages, we obtain covariates from a real dataset TCGA (Chang et al., 2013)... For structured treatments include Small-World (SW), which contains... and TCGA (S), which uses... 10,000 molecules from the QM9 dataset (Ramakrishnan et al., 2014)... For the study on language models, we use the Enriched Equity Evaluation Corpus (EEEC) (Feder et al., 2021).
Dataset Splits	Yes	The synthetic dataset contains 500 training points and 200 testing points... For continuous and binary treatments, we use the average mean squared error on the test set... Small-World (Kaddour et al., 2021). There are 1,000 units in in-sample dataset, and 500 in the out-sample one... TCGA (S) (Kaddour et al., 2021)... The in-sample and datasets consist of 5,000 units and the out-sample one of 4,659 units, respectively. Each unit is a covariate vector x R4000 and these units are split randomly into inand out-sample datasets in each run randomly.
Hardware Specification	Yes	We conduct all the experiments on a machine with i7-8700K CPU, 32G RAM, and four Nvidia Ge Force RTX2080Ti (10GB) GPU cards.
Software Dependencies	No	The paper mentions general software assets including "datasets and the codes for baselines" but does not specify any particular software libraries, frameworks, or their version numbers used for their own methodology implementation.
Experiment Setup	Yes	Table. 7 and Table. 8 show the detail of Trans TEE architecture and hyper-parameters. For all the synthetic and semi-synthetic datasets, we tune parameters based on 20 additional runs... Table 8: Hyper-parameters on different datasets. Bsz indicates the batch size, # Emb indicates the embedding dimension, Lr. S indicates the scheduler of the learning rate (Cos is the cosine annealing Learning rate).