reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Robust Synthetic Control

Authors: Muhammad Amjad, Devavrat Shah, Dennis Shen

JMLR 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments, using both synthetic and real-world datasets, demonstrate that our robust generalization yields an improvement over the classical synthetic control method.
Researcher Affiliation	Academia	Muhammad Amjad EMAIL Operations Research Center Massachusetts Institute of Technology Cambridge, MA 02139, USA; Devavrat Shah EMAIL Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Cambridge, MA 02139, USA; Dennis Shen EMAIL Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Cambridge, MA 02139, USA. All affiliations point to Massachusetts Institute of Technology, an academic institution.
Pseudocode	Yes	Algorithm 1 Robust synthetic control; Algorithm 2 Bayesian robust synthetic control
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	We conduct two sets of experiments: (a) on existing case studies from real world datasets referenced in Abadie et al. (2010, 2011); Abadie and Gardeazabal (2003), and (b) on synthetically generated data. For Basque Country: We only use as data the per-capita GDP (outcome variable) of 17 Spanish regions from 1955-1997. Referenced Abadie and Gardeazabal (2003). For California Anti-tobacco Legislation: we use the annual per-capita cigarette consumption at the state-level for all 50 states in the United States, as well as the District of Columbia, from 1970-2015. Referenced Abadie et al. (2010).
Dataset Splits	Yes	Without loss of generality, let the ﬁrst unit represent the treatment unit exposed to the intervention of interest at time t = T0 + 1. The remaining donor units, 2 i N, are unaﬀected by the intervention for the entire time period [T] = {1, . . . , T}. Let T0 as the number of pre-intervention periods with 1 T0 < T, rendering T T0 as the length of the post-intervention stage. For synthetic simulations: N = 100, T = 2000, while assuming the treatment was performed at t = 1600.
Hardware Specification	No	The paper mentions "Spark (through alternative least squares) and Tensor-Flow come with built-in SVD implementations" and "computational infrastructure", but no specific GPU/CPU models, memory details, or other hardware specifications are provided.
Software Dependencies	No	The paper mentions "Spark (through alternative least squares) and Tensor-Flow" as tools that can perform SVD, but it does not specify any version numbers for these or other software dependencies.
Experiment Setup	Yes	The algorithm utilizes two hyperparameters: (1) a thresholding hyperparameter µ 0... and (2) a regularization hyperameter η 0. We will employ three diﬀerent learning procedures as described in the robust synthetic control algorithm: (1) linear regression (η = 0), (2) ridge regression (η > 0), and (3) LASSO (ζ > 0). In order to choose an appropriate choice of the prior parameter α, we ﬁrst use forward-chaining for the ridge regression setting to ﬁnd the optimal regularization hyperparameter η. ... we choose α = η/ˆσ2 where η is the value obtained via forward chaining.