reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Open Problems in Mechanistic Interpretability

Authors: Lee Sharkey, Bilal Chughtai, Joshua Batson, Jack Lindsey, Jeffrey Wu, Lucius Bushnaq, Nicholas Goldowsky-Dill, Stefan Heimersheim, Alejandro Ortega, Joseph Isaac Bloom, Stella Biderman, Adrià Garriga-Alonso, Arthur Conmy, Neel Nanda, Jessica Mary Rumbelow, Martin Wattenberg, Nandi Schoots, Joseph Miller, William Saunders, Eric J Michaud, Stephen Casper, Max Tegmark, David Bau, Eric Todd, Atticus Geiger, Mor Geva, Jesse Hoogland, Daniel Murfet, Thomas McGrath

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This forward-facing review discusses the current frontier of mechanistic interpretability and the open problems that the field may benefit from prioritizing.
Researcher Affiliation	Collaboration	Lee Sharkey Apollo Research Bilal Chughtai Apollo Research Joshua Batson Anthropic Jack Lindsey Anthropic Jeff Wu Anthropic Lucius Bushnaq Apollo Research Nicholas Goldowsky-Dill Apollo Research Stefan Heimersheim Apollo Research Alejandro Ortega Apollo Research Joseph Bloom Decode Research Stella Biderman Eleuther AI Adria Garriga-Alonso FAR AI Arthur Conmy Google Deep Mind Neel Nanda Google Deep Mind Jessica Rumbelow Leap Laboratories Martin Wattenberg Harvard University Nandi Schoots King s College London and Imperial College London Joseph Miller MATS William Saunders METR Eric J. Michaud MIT Stephen Casper MIT Max Tegmark MIT David Bau Northeastern University Eric Todd Northeastern University Atticus Geiger Pr(AI)2r group Mor Geva Tel Aviv University Jesse Hoogland Timaeus Daniel Murfet University of Melbourne Tom Mc Grath Goodfire
Pseudocode	No	The paper describes methods conceptually and provides figures illustrating concepts (e.g., Figure 2, Figure 3), but it does not contain any formal pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the methodology described, nor does it provide a direct link to a code repository.
Open Datasets	No	The paper is a review of open problems in mechanistic interpretability and does not conduct its own experiments or directly use datasets for empirical validation, therefore it does not provide concrete access information for a dataset it uses.
Dataset Splits	No	The paper is a review and does not conduct its own experiments, therefore it does not provide dataset split information for reproducibility.
Hardware Specification	No	The paper is a review of open problems and does not perform any experiments, therefore no hardware specifications are provided.
Software Dependencies	No	The paper is a review and does not describe a new methodology that requires specific software dependencies with version numbers for replication.
Experiment Setup	No	The paper is a forward-facing review discussing open problems and does not present its own experimental results, thus it does not include details on an experimental setup or hyperparameters.