Provable In-Context Vector Arithmetic via Retrieving Task Concepts
Authors: Dake Bu, Wei Huang, Andi Han, Atsushi Nitanda, Qingfu Zhang, Hau-San Wong, Taiji Suzuki
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical simulations corroborate our theoretical insights. |
| Researcher Affiliation | Academia | 1Department of Computer Science, City University of Hong Kong, Hong Kong SAR 2Center for Advanced Intelligence Project, RIKEN, Japan 3School of Mathematics and Statistics, The University of Sydney, Australia 4Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A STAR), Singapore 5Centre for Frontier AI Research (CFAR), Agency for Science, Technology and Research (A STAR), Singapore 6College of Computing and Data Science, Nanyang Technological University, Singapore 7Department of Mathematical Informatics, The University of Tokyo, Japan. Correspondence to: Wei Huang <EMAIL>, Hau-San Wong <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Training algorithm |
| Open Source Code | No | The paper does not contain any explicit statement about open-sourcing their code or a repository link. |
| Open Datasets | No | In this section, we present our data modeling based on the observations of task vector arithmetic in factual-recall ICL illustrated in Figure 1. We found the near-orthogonal properties in Figure 1 coincide with Park et al. (2025), which suggests that LLMs encode high- and low-level concepts in an approximately orthogonal manner. Specifically, we treat the task vector as a high-level concept representation, while orthogonal components represent task-specific low-level concepts. Details are delayed to Appendix C. |
| Dataset Splits | No | The paper defines 'Training Setups' and 'Test Setup' where data is generated from specific distributions (PQA, PT, PTQA) with noise, rather than splitting a fixed, external dataset into explicit training, validation, and test portions with specified percentages or counts. |
| Hardware Specification | No | The paper describes 'Empirical simulations' in Section 5 but does not mention any specific hardware (e.g., GPU/CPU models, memory) used for these simulations. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, frameworks) used for the experiments. |
| Experiment Setup | Yes | For comparison, we use the same parameters for both ICL-trained and QA-trained models and plot the 1-sigma error dynamics, as illustrated in Figures 2 and 3: K = 2, K = 100, d = 3000, n = 200, M = 30, L = L = 30, η = 5, q V = 10 5, σ0 = 10 3, σ1 = 5 10 3, σp = σ p = 10 2. The QA-trained model is trained for T = 2000 epochs, while the ICL-trained model undergoes a longer training process with T = 5000 epochs. |