Iterative Vectors: In-Context Gradient Steering without Backpropagation
Authors: Yiting Liu, Zhi-Hong Deng
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate IVs across various tasks using four popular models and observe significant improvements. Our findings suggest that in-context activation steering is a promising direction, opening new avenues for future research. |
| Researcher Affiliation | Academia | 1State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University. Correspondence to: Zhi-Hong Deng <EMAIL>. |
| Pseudocode | Yes | The pseudocode for the extraction and evaluation process is available in Appendix B. To facilitate understanding, Appendix C includes an example of the processes described. Algorithm 1 Extraction of Iterative Vectors Algorithm 2 Evaluation Algorithm 3 Episodic Functions |
| Open Source Code | Yes | Our code is available on Git Hub. |
| Open Datasets | Yes | Details of all the datasets used in this paper can be found in Appendix E, while additional results with the other two metrics are provided in Appendix F. E. Datasets A full list of all datasets utilized in this research, along with their corresponding access labels, is detailed in Table 5. The datasets are obtained from Hugging Face (Lhoest et al., 2021). |
| Dataset Splits | Yes | For a given split of an n-way k-shot classification task T = {Ttrain, Tval, Ttest}, which comprises textual query-answer pairs (x, y), an ICL episode 1 is sampled as: We evaluate over 200 episodes for both extraction (Ttrain) and hyperparameter search (Tval). |
| Hardware Specification | Yes | All experiments can be performed on a single Nvidia RTX A6000 GPU unless stated otherwise. Conducted on 3 Nvidia RTX A6000 GPUs. |
| Software Dependencies | No | The paper mentions various language models (GPT-J-6B, Llama 2, Llama 3.1) and the Hugging Face platform for datasets but does not provide specific version numbers for software libraries or dependencies used in their implementation. |
| Experiment Setup | Yes | For the hyperparameters of IVs, we use a fixed iterative batch size of b = 10 and explore the extraction strength and inference strength α1, α2 {0.1, 0.3, 0.5, 0.7, 0.9} across all tasks. Regarding the extraction shot k, we test k {1, 2, 3, 4} for both TVs and IVs. All experiments were conducted using a predetermined random seed (42) to mitigate selection bias. To ensure a robust representation of result distributions, the tests are averaged over a substantial number of episodes, namely 10,000. We reuse hyperparameters obtained from prior searches in the main experiment (k = 4, b = 10 fixed, α1 = 0.3, α2 = 0.5). |