reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Mechanistic Permutability: Match Features Across Layers

Authors: Nikita Balagansky, Ian Maksimov, Daniil Gavrilov

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on the Gemma 2 language model, we demonstrate that our method effectively captures feature evolution across layers, improving feature matching quality. We also show that features persist over several layers and that our approach can approximate hidden states across layers.
Researcher Affiliation	Collaboration	Nikita Balagansky1,2 , Ian Maksimov3,1, Daniil Gavrilov1 1 T-Tech, 2 Moscow Institute of Physics and Technologies, 3 HSE University EMAIL
Pseudocode	No	The paper describes the methods using mathematical formulas and conceptual descriptions, such as equations (1), (2), (4) and figures like Figure 2, which illustrates the folding process. However, there are no clearly labeled pseudocode or algorithm blocks with structured steps in a code-like format.
Open Source Code	No	The paper does not provide an explicit statement about releasing its own source code or a direct link to a code repository for the 'SAE Match' methodology described. It refers to open-sourced SAEs from other works (Lieberum et al., 2024) and datasets, but not its specific implementation code.
Open Datasets	Yes	We tested our methods on subsets of Open Web Text (Gokaslan et al., 2019), Code2, and Wiki Text (Merity et al., 2016). From each dataset, we randomly sampled 100 examples, truncated them to 1,024 tokens, and excluded the beginning-of-sequence (BOS) token when calculating metrics. Code2: https://huggingface.co/datasets/loubnabnl/github-small-near-dedup
Dataset Splits	No	From each dataset, we randomly sampled 100 examples, truncated them to 1,024 tokens, and excluded the beginning-of-sequence (BOS) token when calculating metrics. This describes a sampling strategy but does not specify training, validation, or test splits for models or evaluation.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper mentions using 'GPT-4o' and 'GPT-4o mini' for external LLM evaluation, but these are models/services, not specific software dependencies with version numbers for the core methodology's implementation (e.g., Python, PyTorch versions). No other software dependencies with version numbers are listed.
Experiment Setup	Yes	For matching we used MSE from both decoder and encoder layers. During our initial experiments, we observed that the decoder-only option performs similarly to our scheme, while the encoder-only suffers from poor quality of matching (see Appendix Figure 12 for comparison). Each experiment involved approximately 1,600 LLM evaluations over 100 feature paths spanning 16 layers (details in Appendix Section C). We tested our methods on subsets of Open Web Text (Gokaslan et al., 2019), Code2, and Wiki Text (Merity et al., 2016). From each dataset, we randomly sampled 100 examples, truncated them to 1,024 tokens, and excluded the beginning-of-sequence (BOS) token when calculating metrics.