Why In-Context Learning Models are Good Few-Shot Learners?
Authors: Shiguang Wu, Yaqing Wang, Quanming Yao
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on linear regression tasks, where the distribution of linear weights is pδ(τ) = N(µδ, Σδ) with (µδ, Σδ) p(δ). More details are provided in Appendix F. We denote such meta-level meta-trained ICL model as M2-ICL. After pre-training, we test on unseen domains drawn from p(δ). Each domain provides Ωtr δ = {Dτ}t τ=1 for adaptation, and Ωval δ = {Dτ}Tδ τ=t+1 for performance evaluation. The performance is shown in Figure 6. Note that reasonable solutions include ICL w/ adpt, ICL w/o adpt, and M2-ICl w/ adpt, while M2-ICl w/o adpt serves only as an intermediate product of meta-level meta-learning. We find that M2-ICL w/ adpt outperforms both ICL w/ adpt and ICL w/o adpt, particularly when the number of adaptation tasks is very small (64, Figure 6(a)), while the advantage gradually decreases with the growth of task number (marginal with 1024 adaptation tasks Figure 6(c)). Meta-level meta-learning is effective for fast adaptation on few-task domain, like typical meta-learning s effectiveness for fast adaptation on few-shot task. Note that, although the adaptation strategy in this experiment involves fine-tuning all parameters using gradient descent (i.e., G is derived from MAML with inner updates as full-parameter fine-tuning), any differentiable adaptation strategy can replace the inner-update or be incorporated into other specifications of G. The comparison between ICL and M2-ICL is isomorphic with the comparison between a model trained using standard supervised learning and a model meta-trained using MAML. Experiment results on cross-domain few-shot image classification is provided in Appendix H. |
| Researcher Affiliation | Academia | Shiguang Wu Department of Electronic Engineering, Tsinghua University EMAIL Yaqing Wang Beijing Institute of Mathematical Sciences and Applications EMAIL Quanming Yao Department of Electronic Engineering, Tsinghua University State Key laboratory of Space Network and Communications, Tsinghua University EMAIL |
| Pseudocode | Yes | Algorithm 1 Training M2-ICL Input: Training domain distribution p(δ), ICL model g( ; θ). 1: while Not converge do 2: Sample a domain δ p(δ). 3: Sample tasks τ pδ(δ) to form task sets Ωtr δ and Ωval δ . 4: for Every task τ Ωtr δ do 5: Calculate task loss ℓmeta(τ, g( ; θ)) = 1 Nτ PNτ 1 i=0 ℓ(ˆy(i+1), y(i+1)) by (12). 6: end for 7: Update θδ = θ θ 1 |Ωtr δ| P τ Ωtr δ ℓmeta(τ, g( ; θ)). 8: for Every task τ Ωval δ do 9: Calculate task loss ℓmeta(τ, g( ; θτ)) = 1 Nτ PNτ 1 i=0 ℓ(ˆy(i+1), y(i+1)) by (12). 10: end for 11: Update θ θ θ 1 |Ωval δ | P τ Ωval δ ℓmeta(τ, g( ; θτ)). 12: end while |
| Open Source Code | Yes | Our code is provided at https://github.com/ovo67/Uni_ICL. |
| Open Datasets | Yes | We use META-DATASET (Triantafillou et al., 2020) for training. |
| Dataset Splits | Yes | Consider a domain distribution p(δ). Each domain δ determines a distribution of tasks pδ(τ) where a domain-specific task set Ωδ = {Dτ}Tδ τ=1 can be drawn. During pre-training, we manually split Ωδ into two disjoint task sets: a training (support) task set Ωtr δ = {Dτ}t τ=1 and a validation (query) task set Ωval δ = {Dτ}Tδ τ=t+1. In each task, a training set Dtr τ = {(xτ,i, yτ,i)}n i=1 is used to provide supervised information, and a validation set Dval τ = {(xτ,i, yτ,i)}Nτ i=n+1 is used to evaluate performance and optimize the meta-learner. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It mentions using 'a 8-layer transformer' and 'Res Net (i.e., resnet-18)' as models but no hardware specifications. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1) needed to replicate the experiment. |
| Experiment Setup | Yes | The domain adaptation process, i.e., the inner-loop of meta-level-MAML is configured as step=5, lr=0.0001, with 16 tasks-for-adaptation per domain. During testing, given 8/16/32 domain-specific tasks, the same adaptation process is applied. We considered 5-way 5-shot tasks at the meta-level and 8/16/32 tasks-for-adaptation per domain at the meta-meta-level. |