reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence

Authors: Gouki Minegishi, Hiroki Furuta, Shohei Taniguchi, Yusuke Iwasawa, Yutaka Matsuo

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we experimentally clarify how such meta-learning ability is acquired by analyzing the dynamics of the model s circuit during training.
Researcher Affiliation	Academia	Gouki Minegishi 1 Hiroki Furuta 1 Shohei Taniguchi 1 Yusuke Iwasawa 1 Yutaka Matsuo 1 1The University of Tokyo. Correspondence to: Gouki Minegishi <EMAIL>.
Pseudocode	No	The paper describes the network structure and attention computation using equations in Section 3.2 and Appendix B.1, but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/ gouki510/In-Context-Meta-Learning
Open Datasets	Yes	Specifically, we use the SST22 dataset from the GLUE benchmark, consisting of 872 sentiment-labeled samples.
Dataset Splits	No	The paper describes the generation of examples and queries for its In-Context Meta-Learning setting in Section 3.1 and mentions using the SST2 dataset in a 2-shot setup in Section 6. However, it does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts) for the overall training of the models.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, or memory) used for running its experiments.
Software Dependencies	No	The paper lists training details such as the optimizer (Vanilla SGD) and loss function (Cross-entropy) in Table 3. However, it does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, or TensorFlow versions).
Experiment Setup	Yes	Following prior research (Reddy, 2023), we use a two-layer attention-only transformer shown in Figure 1-(b)... The classifier is a two-layer MLP with ReLU activations, followed by a softmax layer producing probabilities over L labels. We train this network to classify the query item xq into one of the L labels using cross-entropy loss. Both the query/key dimension and the MLP hidden layer dimension are set to 128. We use a batch size of 128 and optimize with vanilla stochastic gradient descent at a learning rate of 0.01. We use T = 3, K = 64, L = 32, N = 4, D = 63, ϵ = 0.1, p B = 0, unless otherwise specified.