Provable In-Context Vector Arithmetic via Retrieving Task Concepts

Authors: Dake Bu, Wei Huang, Andi Han, Atsushi Nitanda, Qingfu Zhang, Hau-San Wong, Taiji Suzuki

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical simulations corroborate our theoretical insights.
Researcher Affiliation Academia 1Department of Computer Science, City University of Hong Kong, Hong Kong SAR 2Center for Advanced Intelligence Project, RIKEN, Japan 3School of Mathematics and Statistics, The University of Sydney, Australia 4Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A STAR), Singapore 5Centre for Frontier AI Research (CFAR), Agency for Science, Technology and Research (A STAR), Singapore 6College of Computing and Data Science, Nanyang Technological University, Singapore 7Department of Mathematical Informatics, The University of Tokyo, Japan. Correspondence to: Wei Huang <EMAIL>, Hau-San Wong <EMAIL>.
Pseudocode Yes Algorithm 1 Training algorithm
Open Source Code No The paper does not contain any explicit statement about open-sourcing their code or a repository link.
Open Datasets No In this section, we present our data modeling based on the observations of task vector arithmetic in factual-recall ICL illustrated in Figure 1. We found the near-orthogonal properties in Figure 1 coincide with Park et al. (2025), which suggests that LLMs encode high- and low-level concepts in an approximately orthogonal manner. Specifically, we treat the task vector as a high-level concept representation, while orthogonal components represent task-specific low-level concepts. Details are delayed to Appendix C.
Dataset Splits No The paper defines 'Training Setups' and 'Test Setup' where data is generated from specific distributions (PQA, PT, PTQA) with noise, rather than splitting a fixed, external dataset into explicit training, validation, and test portions with specified percentages or counts.
Hardware Specification No The paper describes 'Empirical simulations' in Section 5 but does not mention any specific hardware (e.g., GPU/CPU models, memory) used for these simulations.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, frameworks) used for the experiments.
Experiment Setup Yes For comparison, we use the same parameters for both ICL-trained and QA-trained models and plot the 1-sigma error dynamics, as illustrated in Figures 2 and 3: K = 2, K = 100, d = 3000, n = 200, M = 30, L = L = 30, η = 5, q V = 10 5, σ0 = 10 3, σ1 = 5 10 3, σp = σ p = 10 2. The QA-trained model is trained for T = 2000 epochs, while the ICL-trained model undergoes a longer training process with T = 5000 epochs.