Demystifying Singular Defects in Large Language Models

Authors: Haoqi Wang, Tong Zhang, Mathieu Salzmann

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate these findings on a variety of LLMs, including LLa MA2 (Touvron et al., 2023), Phi3 (Abdin et al., 2024), MPT (Team, 2023), Pythia (Biderman et al., 2023), Vicuna1.5 (Platzer & Puschner, 2021), Falcon2 (Malartic et al., 2024), GPT2 (Radford et al., 2019), Qwen2.5 (Team, 2024), to name a few.
Researcher Affiliation Academia 1School of Computer and Communication Sciences, EPFL, Switzerland 2University of Chinese Academy of Sciences, China 3Swiss Data Science Center, Switzerland.
Pseudocode No The paper describes methods using mathematical formulations and textual explanations, but it does not contain any explicitly labeled pseudocode blocks or algorithms.
Open Source Code Yes Code is released at https://github. com/haoqiwang/singular_defect.
Open Datasets Yes Taking LLa MA2-7B as an example, we extract the hidden states of 1K random rows from the Wiki Text2-v1 dataset (Merity et al., 2017) across all layers and compute the norm of each token in each layer.
Dataset Splits Yes Taking LLa MA2-7B as an example, we extract the hidden states of 1K random rows from the Wiki Text2-v1 dataset (Merity et al., 2017) across all layers and compute the norm of each token in each layer.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to conduct its experiments or analysis.
Software Dependencies No The paper mentions various LLMs and quantization techniques but does not specify the versions of key software libraries (e.g., Python, PyTorch, CUDA) used for its own experimental setup.
Experiment Setup No The paper describes the analytical methodology and observed phenomena in LLMs. While it discusses aspects like quantization strategies, it does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or detailed system-level training settings for reproducing experiments.