Demystifying Singular Defects in Large Language Models
Authors: Haoqi Wang, Tong Zhang, Mathieu Salzmann
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate these findings on a variety of LLMs, including LLa MA2 (Touvron et al., 2023), Phi3 (Abdin et al., 2024), MPT (Team, 2023), Pythia (Biderman et al., 2023), Vicuna1.5 (Platzer & Puschner, 2021), Falcon2 (Malartic et al., 2024), GPT2 (Radford et al., 2019), Qwen2.5 (Team, 2024), to name a few. |
| Researcher Affiliation | Academia | 1School of Computer and Communication Sciences, EPFL, Switzerland 2University of Chinese Academy of Sciences, China 3Swiss Data Science Center, Switzerland. |
| Pseudocode | No | The paper describes methods using mathematical formulations and textual explanations, but it does not contain any explicitly labeled pseudocode blocks or algorithms. |
| Open Source Code | Yes | Code is released at https://github. com/haoqiwang/singular_defect. |
| Open Datasets | Yes | Taking LLa MA2-7B as an example, we extract the hidden states of 1K random rows from the Wiki Text2-v1 dataset (Merity et al., 2017) across all layers and compute the norm of each token in each layer. |
| Dataset Splits | Yes | Taking LLa MA2-7B as an example, we extract the hidden states of 1K random rows from the Wiki Text2-v1 dataset (Merity et al., 2017) across all layers and compute the norm of each token in each layer. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to conduct its experiments or analysis. |
| Software Dependencies | No | The paper mentions various LLMs and quantization techniques but does not specify the versions of key software libraries (e.g., Python, PyTorch, CUDA) used for its own experimental setup. |
| Experiment Setup | No | The paper describes the analytical methodology and observed phenomena in LLMs. While it discusses aspects like quantization strategies, it does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or detailed system-level training settings for reproducing experiments. |