reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Capability Localization: Capabilities Can be Localized rather than Individual Knowledge

Authors: Xiusheng Huang, Jiaxiang Liu, Yequan Wang, Jun Zhao, Kang Liu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We found through fidelity and reliability evaluation experiments that individual knowledge cannot be localized. Afterwards, we constructed a dataset for decoupling experiments and discovered the potential for localizing data commonalities. To further reveal this phenomenon, this paper proposes a Commonality Neuron Localization (CNL) method, which successfully locates commonality neurons and achieves a neuron overlap rate of 96.42% on the GSM8K dataset. Finally, we have demonstrated through cross data experiments that commonality neurons are a collection of capability neurons that possess the capability to enhance performance.
Researcher Affiliation	Academia	1The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3Beijing Academy of Artificial Intelligence, Beijing, China EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper includes mathematical equations (e.g., (1)-(6), (7)-(8)), but no explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor structured code-like formatted procedures in the main text.
Open Source Code	Yes	Our code is available at https://github.com/nlpkeg/Capability-Neuron-Localization.
Open Datasets	Yes	We first randomly select 1000 factual samples from the COUNTERFACT (Meng et al., 2022a) dataset... factual dataset zs RE (Levy et al., 2017)... Math: (1) GSM8K (Cobbe et al., 2021)... (2) Meta Math (Yu et al., 2023)... Program: Code25K (Beguˇs, 2021)... Language: (1) Emotion (Kosti et al., 2019)... (2) Imdb (Tripathi et al., 2020)
Dataset Splits	No	Section 5.4, 'Proving I: Enhance Experiment', mentions: 'Specifically, we fine-tune on the training set and evaluate on the validation set.' However, it does not specify the ratios, sample counts, or methodology used for these splits, nor does it refer to any standard predefined splits for the datasets.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running the experiments. It only mentions the model sizes tested (e.g., Llama2-7B, GPTJ-6B).
Software Dependencies	No	The paper mentions using 'Adam (Kingma, 2014) is selected as the optimizer algorithm' but does not specify software dependencies like programming languages (e.g., Python), libraries (e.g., PyTorch, TensorFlow), or their specific version numbers.
Experiment Setup	Yes	Adam (Kingma, 2014) is selected as the optimizer algorithm with lr=1e-5, and the optimized parameters are set as follows: Random: We randomly select neurons that are consistent with the number of the located neurons, the proportion of occupying the overall parameters of the neurons is 0.15%. W/o located: We mask the located neurons (set their parameter to 0) and fine-tune all other neurons, the proportion of occupying the overall parameters of the neurons is 99.85%. Located: We fine-tune the located neurons, the proportion of occupying the overall parameters of the neurons is 0.15%.