reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability

Authors: Atticus Geiger, Duligur Ibeling, Amir Zur, Maheep Chaudhary, Sonakshi Chauhan, Jing Huang, Aryaman Arora, Zhengxuan Wu, Noah Goodman, Christopher Potts, Thomas Icard

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Causal abstraction provides a theoretical foundation for mechanistic interpretability, the field concerned with providing intelligible algorithms that are faithful simplifications of the known, but opaque low-level details of black box AI models. Our contributions are (1) generalizing the theory of causal abstraction from mechanism replacement (i.e., hard and soft interventions) to arbitrary mechanism transformation (i.e., functionals from old mechanisms to new mechanisms), (2) providing a flexible, yet precise formalization for the core concepts of polysemantic neurons, the linear representation hypothesis, modular features, and graded faithfulness, and (3) unifying a variety of mechanistic interpretability methods in the common language of causal abstraction...
Researcher Affiliation	Academia	Atticus Geiger , Duligur Ibeling , Amir Zur , Maheep Chaudhary , Sonakshi Chauhan , Jing Huang , Aryaman Arora , Zhengxuan Wu , Noah Goodman , Christopher Potts , Thomas Icard Pr(Ai)2R Group Stanford University Corresponding authors: EMAIL; EMAIL
Pseudocode	Yes	Algorithm 1: Scrub(b, H) 2 for H H do 3 if H XIn L then 4 h h Proj H(b) 7 for G {G : (G, H) C} do 8 s Val XIn L 9 g g Scrub(s, {G}) 10 if H Domain(δ) then 11 s {s Val XIn L : Projδ(H)(Solve(Hs)) = Projδ(H)(Solve(Hb))} 12 g g Scrub(s, {G : (G, H) C}) 13 h h Proj H(Solve(Lg)) 14 return h
Open Source Code	Yes	We provide a companion jupyter notebook that walks through this example.
Open Datasets	No	The paper uses illustrative examples like the hierarchical equality task and bubble sort algorithm to demonstrate its theoretical framework. It refers to benchmarks like CEBa B in an illustrative capacity, not as datasets for empirical experiments presented in the paper. Therefore, it does not provide access information for specific datasets used for empirical evaluation.
Dataset Splits	No	The paper is theoretical, presenting a causal abstraction framework and illustrative examples. It does not conduct empirical experiments on specific datasets with train/test/validation splits. Therefore, no dataset split information is provided.
Hardware Specification	No	The paper primarily focuses on theoretical contributions and illustrative examples of causal abstraction. It does not describe any specific experiments that would require detailing hardware specifications such as GPU or CPU models.
Software Dependencies	No	The paper discusses theoretical concepts and provides a conceptual framework. While it mentions a 'companion jupyter notebook,' it does not specify any software libraries or solvers with version numbers that would be necessary to reproduce experiments.
Experiment Setup	No	The paper presents a theoretical foundation and illustrates it with examples. It does not conduct empirical experiments that would require detailing hyperparameters, model initialization, training schedules, or other specific experimental setup details.