reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

REEF: Representation Encoding Fingerprints for Large Language Models

Authors: Jie Zhang, Dongrui Liu, Chen Qian, Linfeng Zhang, Yong Liu, Yu Qiao, Jing Shao

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we provide a comprehensive evaluation of REEF. Section 5.1 evaluates REEF s effectiveness in distinguishing LLMs derived from the root victim model from unrelated models. Following this, Section 5.2 assesses REEF s robustness to subsequent developments of the victim model, such as fine-tuning, pruning, merging, and permutations. Section 5.3 presents ablation studies on REEF across varying sample numbers and datasets. Finally, Section 5.4 discusses REEF s sensitivity to training data and its capacity for adversarial evasion. Figure 3: Heatmaps depicting the CKA similarity between the representations of the victim LLM (Llama-2-7B) and those of various suspect LLMs on the same samples.
Researcher Affiliation	Academia	1 Shanghai Artificial Intelligence Laboratory 2 University of Chinese Academy of Sciences 3 Renmin University of China 4 Shanghai Jiao Tong University
Pseudocode	No	The paper describes the CKA similarity index and its mathematical formulation in Section 4 and provides proofs in Appendix A, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is publicly accessible at https://github.com/AI45Lab/REEF. To ensure the reproducibility of this study, we have uploaded the source code as part of the supplementary material. Furthermore, the code and datasets will be made available on Git Hub after the completion of the double-blind review process, enabling others to replicate our study.
Open Datasets	Yes	We use both a linear kernel and an RBF kernel to compute the layer-wise and inter-layer CKA similarity of representations between the victim and suspect models on 200 samples from the Truthful QA dataset (Lin et al., 2022). To assess the effectiveness of REEF across various data types, we also conduct experiments using SST2 (Socher et al., 2013), Conf AIde (Mireshghallah et al., 2023), PKUSafe RLHF (Ji et al., 2024), and Toxi Gen (Hartvigsen et al., 2022).
Dataset Splits	Yes	The dataset is split into training and test sets with a 4:1 ratio. (Truthful QA dataset)
Hardware Specification	Yes	Using the dataset provided in the original paper, we conduct pre-training on 8 A100 GPUs with different random seeds for data shuffling.
Software Dependencies	No	The paper mentions using a 'linear kernel and a Radial Basis Function (RBF) kernel' and 'cross-entropy' for task loss, but does not provide specific version numbers for any software libraries or dependencies used in the implementation.
Experiment Setup	Yes	For training, we use the Truthful QA dataset (Lin et al., 2022), concatenating each question with its truthful answer as positive samples and with its false answer as negative samples. The dataset is split into training and test sets with a 4:1 ratio. We use both a linear kernel and an RBF kernel to compute the layer-wise and inter-layer CKA similarity of representations between the victim and suspect models on 200 samples from the Truthful QA dataset. We focus on reporting the similarity at layer 18 in subsequent experiments. The hyperparameters for pre-training are set as follows: a global batch size of 512, a learning rate of 4e-4, a micro-batch size of 8, a maximum of 56,960 steps, a weight decay of 0.1, beta1 of 0.9, beta2 of 0.95, gradient clipping at 1.0, and a minimum learning rate of 4e-5.