reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

In-Context Deep Learning via Transformer Models

Authors: Weimin Wu, Maojiang Su, Jerry Yao-Chieh Hu, Zhao Song, Han Liu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our findings on synthetic datasets for 3-layer, 4-layer, and 6-layer neural networks. The results show that ICL performance matches that of direct training. ... In this section, we conduct experiments to verify the capability of ICL to learn feed-forward neural networks, and give details in Appendix F.
Researcher Affiliation	Academia	1Center for Foundation Models and Generative AI, Northwestern University, USA 2Department of Computer Science, Northwestern University, USA 3University of California Berkeley, USA 4Department of Statistics and Data Science, Northwestern University, USA.
Pseudocode	No	The paper describes methods and processes algorithmically but does not present any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	Our code is based on the Py Torch implementation of the in-context learning for the transformer (Garg et al., 2022) at https: //github.com/dtsip/in-context-learning. This refers to a third-party implementation, not the authors' own source code for the methodology described.
Open Datasets	No	We validate our findings on synthetic datasets for 3-layer, 4-layer, and 6-layer neural networks. ... Specifically, we sample the input of feed-forward network x Rd from the Gaussian mixture distribution: w1N( 2, Id) + w2N(2, Id), where w1, w2 R, and d=20. We consider the network f : Rd R as a 3-, 4-, or 6-layer NN. We generate the true output by y = f(x). The datasets are synthetic and generated by the authors, with no public access information provided for them.
Dataset Splits	Yes	For the pertaining data, we use 50 in-context examples, and sample them from N( 2, Id). For the testing data, we use 75 in-context examples... The batch size is 64, and the number of batch is 100, i.e., we have 6400 samples totally. ... We assess performance using the mean R-squared value for all 6400 samples.
Hardware Specification	Yes	We conduct all experiments using 1 NVIDIA A100 GPU with 80GB of memory.
Software Dependencies	No	Our code is based on the Py Torch implementation of the in-context learning for the transformer (Garg et al., 2022)... No specific version of PyTorch or other software dependencies is mentioned.
Experiment Setup	Yes	Both models comprise 12 transformer blocks, each with 8 attention heads, and share the same hidden and MLP dimensions of 256. ... In our setting, we sample the pertaining data from N( 2, Id), i.e., w1 = 1 and w2 = 0. Following the pre-training method in (Garg et al., 2022), we use the batch size as 64. To construct each sample in a batch... The pretraining process iterates for 500k steps. ... We use the MSE loss between prediction and true value of oi. ...train the network with MSE loss for 100 epochs.