reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Transformers Handle Endogeneity in In-Context Linear Regression

Authors: Haodong Liang, Krishna Balasubramanian, Lifeng Lai

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experiments validate these theoretical findings, showing that the trained transformer provides more robust and reliable in-context predictions and coefficient estimates than the 2SLS method, in the presence of endogeneity. We conduct a simulation study to evaluate the performance of the ICL-pretrained transformer model in handling endogeneity.
Researcher Affiliation	Academia	Haodong Liang UC Davis EMAIL Krishnakumar Balasubramanian UC Davis EMAIL Lifeng Lai UC Davis EMAIL
Pseudocode	Yes	Algorithm 1 In-Context Distribution P Algorithm 2 Extracting the regression coefficients
Open Source Code	No	The paper does not contain any explicit statements about code release or links to code repositories. Phrases like "We release our code" or "The source code is available at" are absent.
Open Datasets	Yes	We use the dataset from the study of Angrist & Evans (1998).
Dataset Splits	Yes	We set the maximum input sample size to 51 (n = 50 training samples and one query sample)... For each run we randomly select 50 samples from the dataset, and make the boxplot of the estimated β over 500 runs.
Hardware Specification	Yes	The training of the transformer in our experiment was conducted on a Windows 11 machine with the following specifications: GPU: NVIDIA GeForce RTX 4090 CPU: Intel Core i9-14900KF Memory: 32 GB DDR5, 5600MHz
Software Dependencies	No	The paper mentions "GPT-2 settings" for the transformer backbone but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	We set the maximum input sample size to 51 (n = 50 training samples and one query sample), the dimension of endogenous variable p = 5, and the dimension of instrument q = 10. The backbone of the transformer block is initialized using GPT-2 settings, with 12 attention heads (M = 12), 80-dimensional embedding space (D = 80) and 2 layers (L0 = 2)... We employ the looped transformer architecture, consisting of 10 identical cascading transformer blocks. The transformer model is trained under the ICL loss (11) with a batch size of N = 64, over a total of 300,000 training steps. The noise level σϵ is set to 1.