Towards Better Understanding of In-Context Learning Ability from In-Context Uncertainty Quantification

Authors: Shang Liu, Zhongze Cai, Guanting Chen, Xiaocheng Li

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In addition to developing theories, we empirically demonstrate the effectiveness of Transformers to in-context predicting the mean and quantifying the variance of regression tasks. We design a series of out-of-distribution (OOD) experiments, which have generated significant interest within the community (Garg et al. (2022); Raventós et al. (2024); Singh et al. (2024)). These experiments provide insights in designing the pretraining process and understanding the ICL capabilities of transformers.
Researcher Affiliation Academia Shang Liu EMAIL Imperial College Business School Imperial College London Zhongze Cai EMAIL Imperial College Business School Imperial College London Guanting Chen EMAIL Department of Statistics and Operations Research University of North Carolina Xiaocheng Li EMAIL Imperial College Business School Imperial College London
Pseudocode No The paper describes methods and derivations in textual format and mathematical equations, but does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets No The training data is generated synthetically based on specified statistical distributions (e.g., PX: x(i) t i.i.d. N(0, Id), Pϵ: ϵ(i) t i.i.d. N(0, 1)), rather than using a pre-existing public dataset. No public access information for any dataset is provided.
Dataset Splits No The validation and testing sets are randomly generated for each evaluation, and the training data is generated afresh for each batch. The paper does not specify fixed or reproducible training/test/validation splits with exact percentages or sample counts, nor does it reference standard predefined splits.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to conduct the experiments.
Software Dependencies No The paper does not provide specific software dependency details with version numbers for the implementation of their work. It only mentions 'transformers package of Hugging Face' in the context of other works, without a version number.
Experiment Setup Yes Throughout the paper, we consider the dimension d = 8. The batch size b = 64. All the numerical experiments in our paper run for 200,000 batches. For the basic setup, the parameters for noise intensity are τ = τ = 20. For OOD experiments, specific parameters are given, e.g., S-OOD: τ = 80, τ = 20; M-OOD: τ = 100, τ = 400; L-OOD: τ = 100, τ = 1600. For length shift experiments, models are trained on prompts with lengths ranging from 1 to 44 or 45 to 100.