Improving Reasoning Performance in Large Language Models via Representation Engineering
Authors: Bertram Højer, Oliver Jarvis, Stefan Heinrich
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply control vectors to Mistral-7B-Instruct and a range of Pythia models on an inductive, a deductive and mathematical reasoning task. We show that an LLM can, to a certain degree, be controlled to improve its perceived reasoning ability by modulating activations. The intervention is dependent upon the ability to reliably extract the model s typical state when correctly solving a task. Our results suggest that reasoning performance can be modulated in the same manner as other information-processing tasks performed by LLMs and demonstrate that we are capable of improving performance on specific tasks via a simple intervention on the residual stream with no additional training. |
| Researcher Affiliation | Academia | Bertram Højer , Oliver Jarvis , Stefan Heinrich Department of Computer Science, IT University of Copenhagen, Denmark EMAIL |
| Pseudocode | No | The paper does not contain explicit pseudocode or algorithm blocks. It provides mathematical equations for defining control vectors and transformer operations (Equation 1, 2, 3) but these are not formatted as pseudocode. |
| Open Source Code | Yes | We publish the code for deriving control vectors and analyzing model representations.1 The method allows us to improve performance on reasoning benchmarks and assess how control vectors influence the final logit distribution of a model via metrics such as KL divergence and entropy. We apply control vectors to Mistral-7B-Instruct and a range of Pythia models on an inductive, a deductive and mathematical reasoning task. 1code: https://github.com/bertramhojer/improve-reasoning-iclr-2025 |
| Open Datasets | Yes | b Ab I comprises various reasoning tasks, one related to deductive reasoning of which there are 2, 000 examples. GSM8K consists of high quality grade school math problems on which relatively capable LLMs still struggle. We do not perform any additional pre-processing and have downloaded the data directly from https://huggingface.co/datasets/Muennighoff/babi. |
| Dataset Splits | Yes | For each dataset we create train and test splits with stratified labels. We then derive the control vector based on model representations when it generates outputs on examples from the train split and test model performance with a control vector applied on the test set. We used an 80/20 train-test split to train and evaluate the performance on control vectors. |
| Hardware Specification | No | The paper mentions using pre-trained models (Pythia-1.4B, Pythia-2.8B, Mistral-7B-Instruct) but does not specify the hardware used to run the experiments or extract activations. |
| Software Dependencies | No | Our framework is built as a wrapper around Py Torch, enabling easy extraction of hidden dimension representations and application of control vectors. The models described in section 3.3 were loaded using the Hugging Face API and details on model version are described there. |
| Experiment Setup | Yes | We generally assess the impact of the intervention at α [ 1, 1] at increments of 0.1, but look at a range of [ 3, 3] for Mistral-7B-Instruct. When deriving control vectors we get a control vector for each layer, previous work however indicates that only applying the vectors to the middle layer is enough to induce strong changes to model outputs (Templeton et al., 2024). |