reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Competition Dynamics Shape Algorithmic Phases of In-Context Learning

Authors: Core Francisco Park, Ekdeep Singh Lubana, Hidenori Tanaka

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Systematically varying these factors, we show our proposed task turns out to be extremely rich, reproducing most known phenomenology of ICL and hence offering a unified setup for studying the concept. Building on this, we start to deconstruct how a model trained on our task performs ICL, finding that there exist (at least) four broad algorithms that can explain the model s behavior. These algorithms combine a fuzzy retrieval vs. inference approach with either unigram or bigram statistics of the context, and, as we show, engage in a competition dynamics with each other to dictate model behavior. Interestingly, we find the precise experimental conditions (e.g., amount of training and context-size) decide which algorithm wins the competition, hence yielding several phases in the model s ability to perform ICL: varying experimental conditions elicits different algorithmic behaviors (at times rather abruptly), making it difficult for understanding of ICL derived in one configuration to help predict model behavior in another one.
Researcher Affiliation	Collaboration	1CBS-NTT Program in Physics of Intelligence, Harvard University 2Department of Physics, Harvard University 3Physics & Informatics Laboratories, NTT Research, Inc. 4EECS Department, University of Michigan, Ann Arbor
Pseudocode	Yes	To make the dataset clear we attach a minimal Py Torch (Paszke et al., 2019) implementation. We generated this code with the help of GPT4o (Open AI, 2024). 1 import torch 2 from torch.utils.data import Iterable Dataset 3 import numpy as np 5 class Markov Chain Dataset(Iterable Dataset): 6 def __init__(self, k, l, n, seed=None): 7 self.k = k # Number of states 8 self.l = l # Output sequence length 9 self.n = n # Number of transition matrices 10 self.seed = seed # Seed for reproducibility 12 if seed is not None: 13 np.random.seed(seed) 14 torch.manual_seed(seed) 16 # Generate n transition matrices, each with k states 17 self.transition_matrices = [] 18 for _ in range(n): 19 matrix = np.array([np.random.dirichlet([1] * k) for _ in range(k)]) 20 self.transition_matrices.append(matrix) 22 # Prior distribution over transition matrices 23 self.prior = np.random.dirichlet([1] * n) 25 def get_stationary_distribution(self, matrix): 26 # Compute the stationary distribution of the transition matrix 27 eigvals, eigvecs = np.linalg.eig(matrix.T) 28 stationary = np.real(eigvecs[:, np.isclose(eigvals, 1)]) 29 stationary = stationary[:, 0] 30 stationary /= stationary.sum() 31 return stationary 33 def sample_chain(self, transition_matrix): 34 # Sample the first state from the stationary distribution 35 stationary_distribution = self.get_stationary_distribution( transition_matrix) 36 first_state = np.random.choice(self.k, p=stationary_distribution) 38 # Generate the sequence 39 sequence = [first_state] 40 for _ in range(1, self.l): 41 current_state = sequence[-1] 42 next_state = np.random.choice(self.k, p=transition_matrix[ current_state]) 43 sequence.append(next_state) 45 return sequence 47 def __iter__(self): 48 while True: 49 # Choose a transition matrix based on the prior 50 matrix_index = np.random.choice(self.n, p=self.prior) 51 chosen_matrix = self.transition_matrices[matrix_index] 53 # Generate a sequence using the chosen transition matrix 54 sequence = self.sample_chain(chosen_matrix) 55 yield torch.tensor(sequence) 57 # Example usage: 58 # k = 3 (states), l = 10 (sequence length), n = 5 (transition matrices), 59 dataset = Markov Chain Dataset(k=3, l=10, n=5, seed=42) 60 iterator = iter(dataset) 62 # Get a sample sequence 63 sample_sequence = next(iterator) 64 print(sample_sequence)#e.g. tensor([8, 8, 9, 5, 3, 4, 8, 8]) Listing 1: Markov Mixtures Data Generating Process
Open Source Code	Yes	All code used to run the experiments and analysis are available at https://github.com/cfpark00/markov-mixtures.
Open Datasets	Yes	We propose a novel sequence modeling task that involves learning to simulate a finite mixture of Markov chains. To make the dataset clear we attach a minimal Py Torch (Paszke et al., 2019) implementation. We generated this code with the help of GPT4o (Open AI, 2024). Listing 1: Markov Mixtures Data Generating Process
Dataset Splits	No	The sampling process is repeated every step of training, i.e., the model is unlikely to see the same sequence twice during training (a.k.a. online training). Evaluation. We evaluate trained models on In-Distribution (ID) and Out-Of-Distribution (OOD) chains. For ID evaluations, we select the transition matrix T from the training set Ttrain and for OOD evaluations, we sample a novel T using a Dirichlet prior.
Hardware Specification	Yes	We trained our models on NVIDIA A100 80 GB GPUs, running 5 experiments in parallel on the same GPU, and hence yielding a wallclock time of 3.5 hours.
Software Dependencies	No	We adapted code from nano GPT (Karpathy, 2022), and implemented Rotational Positional Embedding (Ro PE) (Su et al., 2023) instead of the default learning positional embedding, which significantly delayed the emergence of Bi-Inf. To make the dataset clear we attach a minimal Py Torch (Paszke et al., 2019) implementation. We generated this code with the help of GPT4o (Open AI, 2024).
Experiment Setup	Yes	We used the Adam W optimizer (Loshchilov & Hutter, 2019) with learning rate 6 10 4. We kept the estimated FLOPs constant to 1 1016, where we estimate the compute by 6DN (D denotes the number of tokens seen during training and N denotes the number of model parameters). Using a batch size of 128 resulted in a training of 107978 steps.