Towards Better Understanding of In-Context Learning Ability from In-Context Uncertainty Quantification
Authors: Shang Liu, Zhongze Cai, Guanting Chen, Xiaocheng Li
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In addition to developing theories, we empirically demonstrate the effectiveness of Transformers to in-context predicting the mean and quantifying the variance of regression tasks. We design a series of out-of-distribution (OOD) experiments, which have generated significant interest within the community (Garg et al. (2022); Raventós et al. (2024); Singh et al. (2024)). These experiments provide insights in designing the pretraining process and understanding the ICL capabilities of transformers. |
| Researcher Affiliation | Academia | Shang Liu EMAIL Imperial College Business School Imperial College London Zhongze Cai EMAIL Imperial College Business School Imperial College London Guanting Chen EMAIL Department of Statistics and Operations Research University of North Carolina Xiaocheng Li EMAIL Imperial College Business School Imperial College London |
| Pseudocode | No | The paper describes methods and derivations in textual format and mathematical equations, but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | No | The training data is generated synthetically based on specified statistical distributions (e.g., PX: x(i) t i.i.d. N(0, Id), Pϵ: ϵ(i) t i.i.d. N(0, 1)), rather than using a pre-existing public dataset. No public access information for any dataset is provided. |
| Dataset Splits | No | The validation and testing sets are randomly generated for each evaluation, and the training data is generated afresh for each batch. The paper does not specify fixed or reproducible training/test/validation splits with exact percentages or sample counts, nor does it reference standard predefined splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to conduct the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details with version numbers for the implementation of their work. It only mentions 'transformers package of Hugging Face' in the context of other works, without a version number. |
| Experiment Setup | Yes | Throughout the paper, we consider the dimension d = 8. The batch size b = 64. All the numerical experiments in our paper run for 200,000 batches. For the basic setup, the parameters for noise intensity are τ = τ = 20. For OOD experiments, specific parameters are given, e.g., S-OOD: τ = 80, τ = 20; M-OOD: τ = 100, τ = 400; L-OOD: τ = 100, τ = 1600. For length shift experiments, models are trained on prompts with lengths ranging from 1 to 44 or 45 to 100. |