reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DOCS: Quantifying Weight Similarity for Deeper Insights into Large Language Models

Authors: Zeping Min, Xinshang Wang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We introduce a novel index, the Distribution of Cosine Similarity (DOCS), for quantitatively assessing the similarity between weight matrices in Large Language Models (LLMs), aiming to facilitate the analysis of their complex architectures. Leveraging DOCS, our analysis uncovers intriguing patterns in the latest open-source LLMs: adjacent layers frequently exhibit high weight similarity and tend to form clusters, suggesting depth-wise functional specialization. Additionally, we prove that DOCS is theoretically effective in quantifying similarity for orthogonal matrices, a crucial aspect given the prevalence of orthogonal initializations in LLMs. This research contributes to a deeper understanding of LLM architecture and behavior, offering tools with potential implications for developing more efficient and interpretable models. In this work, we extend the application of similarity analysis by directly examining the weight matrices of various LLMs1, instead of focusing on representations. By analyzing the weights themselves, we aim to uncover deeper insights into the model s structure and functionality that are not apparent from representations alone. [...] We conduct experiments to demonstrate the capabilities of DOCS and to gain insights into the internal structure of LLMs.
Researcher Affiliation	Industry	Zeping Min Alibaba Group Hupan Laboratory AMSS, Chinese Academy of Sciences EMAIL Xinshang Wang Alibaba Group EMAIL
Pseudocode	Yes	Algorithm 1 Computation of the DOCS Similarity Index SDOCS 1: Input: Matrices X = [X1, X2, . . . , Xm] Rn m and Y = [Y1, Y2, . . . , Ym] Rn m 2: Output: Similarity index SDOCS 3: function MAXCOSSIM(A, B) 4: Compute the cosine similarity matrix C Rm m where Cjk = A j Bk Aj Bk 5: For each column Aj, find s Aj = maxk \|Cjk\| 6: return s A = [s A1, s A2, . . . , s Am] 7: end function 8: Compute s X = MAXCOSSIM(X, Y ) 9: Compute s Y = MAXCOSSIM(Y, X) 10: Fit a Gumbel distribution to s X to estimate the location parameter u X using maximum likelihood estimation 11: Fit a Gumbel distribution to s Y to estimate the location parameter u Y using maximum likelihood estimation 12: Compute the similarity index: SDOCS = u X + u Y
Open Source Code	No	The paper does not provide an explicit statement or link for the open-sourcing of its own methodology (DOCS). It mentions 'https://github.com/huggingface/transformers' in a footnote, but this refers to third-party LLM implementations the authors used, not their own code.
Open Datasets	Yes	Our analysis uncovers intriguing patterns in the latest open-source LLMs: [...] We conduct experiments to demonstrate the capabilities of DOCS and to gain insights into the internal structure of LLMs. In LLM implementations2, the rows of a weight matrix correspond to output dimensions, and the columns correspond to input dimensions. [...] Figure 2 provides a visual comparison of eight different similarity indices applied to the MLP-UP layers of the Meta-Llama-3.1-8B-Instruct model. [...] We investigated the similarity patterns between neighboring transformer layers by analyzing various weight matrices (Wv, Wk, Wq, Wo, MLP-UP, MLP-DOWN) in various LLMs. We employed DOCS to compute and visualize these similarities. Figure 3 illustrates the results for Wk, Wq, and MLP-DOWN on gemma-2-27b-it. [...] LLMs, including GPT-2 (Radford et al., 2019), Llama (Touvron et al., 2023), Mistral (Jiang et al., 2023), Llama 3 (Dubey et al., 2024), Gpt-neox-20b (Black et al., 2022), Opt (Zhang et al., 2022), Codegeex (Zheng et al., 2023), Glm-130b (Zeng et al., 2022), and Flm (Li et al., 2023), adopt architectures where all layers have the same size.
Dataset Splits	No	The paper focuses on analyzing the weights of existing Large Language Models (LLMs) rather than training new models or performing evaluations that require dataset splits (training, validation, test). Therefore, it does not provide information on dataset splits.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., GPU models, CPU types, memory) used for running the experiments or analyses.
Software Dependencies	No	The paper mentions 'https://github.com/huggingface/transformers' as a source for LLM implementations, but it does not specify any software dependencies (libraries, frameworks, or operating systems) with version numbers that were used to implement or run the DOCS methodology.
Experiment Setup	No	The paper describes its proposed methodology (DOCS) and analyses performed on existing LLMs. It does not provide details on experimental setup such as hyperparameters, training configurations, learning rates, batch sizes, or optimization settings, as it is not involved in training new models.