reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words

Authors: Gouki Gouki, Hiroki Furuta, Yusuke Iwasawa, Yutaka Matsuo

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we propose a suite of evaluations for SAEs to analyze the quality of monosemantic features by focusing on polysemous words. Our findings reveal that SAEs developed to improve the MSE-L0 Pareto frontier may confuse interpretability, which does not necessarily enhance the extraction of monosemantic features. The analysis of SAEs with polysemous words can also figure out the internal mechanism of LLMs; deeper layers and the Attention module contribute to distinguishing polysemy in a word.
Researcher Affiliation	Academia	Gouki Minegishi1 Hiroki Furuta1 Yusuke Iwasawa1 Yutaka Matsuo1 1The University of Tokyo EMAIL
Pseudocode	No	The paper includes mathematical formulations like equations (1) to (4) but does not contain explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code	Yes	1Code: https://github.com/gouki510/PS-Eval
Open Datasets	Yes	Dataset: https://huggingface.co/datasets/gouki510/Wic_data_for_SAE-Eval
Dataset Splits	No	The paper states that the PS-Eval dataset consists of "1112 (label 0: 556, label 1: 556)" samples for evaluation (Table 1) and describes the WiC and Red Pajama datasets for training data. However, it does not provide specific training/test/validation splits (e.g., percentages or exact counts) for their own SAE training experiments.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions "GPT2-small" as the base LLM used, but does not list any specific software dependencies (e.g., libraries, frameworks) with version numbers.
Experiment Setup	Yes	Following prior work (Templeton et al., 2024), we use an expand ratio of R = 32 and a sparsity regularization factor of λ = 0.05 by default for training SAE. The base LLM used as activations for the SAE is GPT-2 small (Radford et al., 2019). Unless specified otherwise, activations are extracted from the 4th layer. ... (Table 4) Batch Size 8192, Total Training Steps 200,000, Learning Rate 2e-4, Context Size 256.