Identifying Query-Relevant Neurons in Large Language Models for Long-Form Texts
Authors: Lihu Chen, Adam Dejl, Francesca Toni
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations demonstrate that our method outperforms baseline methods significantly. More importantly, analysis of neuron distributions reveals the presence of visible localized regions, particularly within different subject domains. Finally, we show potential applications of our detected neurons in knowledge editing and neuron-based prediction. |
| Researcher Affiliation | Academia | Imperial College, London, UK |
| Pseudocode | No | The paper describes a framework and its components using mathematical formulations (e.g., equations 1-7) and step-by-step descriptions in text, but it does not present any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/tigerchen52/qrneuron |
| Open Datasets | Yes | Domain Dataset is derived from MMLU (Hendrycks et al. 2020), a multi-choice QA benchmark designed to evaluate models across a wide array of subjects with varying difficulty levels. [...] Language Dataset is adapted from Multilingual LAMA (Kassner, Dufter, and Sch utze 2021), which is a dataset to investigate knowledge in language models in a multilingual setting covering 53 languages. |
| Dataset Splits | Yes | Domain Dataset is derived from MMLU (Hendrycks et al. 2020), a multi-choice QA benchmark designed to evaluate models across a wide array of subjects with varying difficulty levels. The subjects encompass traditional disciplines such as mathematics and history, as well as specialized fields like law and ethics. In our study, we select six high school exam subjects from the test set: Biology, Physics, Chemistry, Mathematics, Computer Science, and Geography. [...] Hyper-parameters are selected based on a hold-out set of biology queries with 50 samples. |
| Hardware Specification | Yes | We ran all experiments on three NVIDIA-V100. |
| Software Dependencies | No | The paper mentions using Llama-2-7B and Mistral-7B models, but does not provide specific software dependency versions (e.g., PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | As for the hyper-parameters, the number of estimation steps was set to m=16 and the attribution threshold t to 0.2 times the maximum attribution score. The template number was |Q| = 3, the frequency u for obtaining common neurons was 30%, and the top-v for select coarse neurons was 20. We ran all experiments on three NVIDIA-V100. It took 120 seconds on average to locate neurons for a query with three prompt templates. |