reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CollabEdit: Towards Non-destructive Collaborative Knowledge Editing

Authors: Jiamu Zheng, Jinghuai Zhang, Tianyu Du, Xuhong Zhang, Jianwei Yin, Tao Lin

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on two canonical datasets demonstrate the superiority of COLLABEDIT compared to other destructive baselines, and results shed light on addressing three collaborative KE challenges and future applications. Our code is available at https://github.com/LINs-lab/CollabEdit.
Researcher Affiliation	Academia	Jiamu Zheng 1, Jinghuai Zhang 3 Tianyu Du 1, Xuhong Zhang 1 Jianwei Yin 1 Tao Lin 2 Zhejiang University 1 Westlake University 2 University of California, Los Angeles 3 EMAIL EMAIL EMAIL EMAIL EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 COLLABEDIT: Non-destructive Collaborative Knowledge Editing Algorithm 2 Get Delta And KKT
Open Source Code	Yes	Our code is available at https://github.com/LINs-lab/CollabEdit.
Open Datasets	Yes	Following the literature (Meng et al., 2022; 2023), we use Multi-Counter Fact (MCF) (Meng et al., 2022) and zs RE (Levy et al., 2017) as datasets and evaluate the editing performance on GPT2-XL (Radford et al., 2019) and GPT-J (6B) (Wang & Komatsuzaki, 2021).
Dataset Splits	No	The paper mentions using Multi-Counter Fact (MCF) and zs RE datasets, and refers to following existing literature for evaluation. However, it does not explicitly provide specific train/test/validation dataset splits or percentages within the provided text, nor does it refer to a specific citation for a standard split to be used for reproduction.
Hardware Specification	No	The paper does not contain any specific hardware details such as GPU models, CPU types, or other computing resources used for running the experiments. It only mentions evaluating on GPT2-XL and GPT-J (6B) models.
Software Dependencies	No	The paper discusses various KE algorithms (MEMIT, MALMEN) and model merging techniques (SIMPLE-AVERAGE, TASK-ARITHMETIC, TIES-MERGING) used in their methodology. However, it does not provide specific version numbers for any programming languages (e.g., Python), libraries (e.g., PyTorch, TensorFlow), or other software dependencies.
Experiment Setup	Yes	For consistency, we edit the same set of layers R as MEMIT such as the 13-17th layers of GPT-2 XL. [...] We compute C = µ Ek kk , where Ek kk is estimated as an uncentered covariance statistic collected using an empirical sample of vector inputs to the layer (e.g., 100,000 Wikipedia records). µ is a hyperparameter that balances the weighting of new v.s. old associations (a typical value of µ is 1.5 104 according to MEMIT).