reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Validating Mechanistic Interpretations: An Axiomatic Approach

Authors: Nils Palumbo, Ravi Mangal, Zifan Wang, Saranya Vijayakumar, Corina S. Pasareanu, Somesh Jha

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the applicability of our axioms for validating mechanistic interpretations via two case studies2 the existing, well-known analysis by Nanda et al. (2023a) of a model with a Transformer-based architecture (Vaswani et al., 2017) trained to solve modular addition, and a new analysis... Our implementation is available at https://github. com/nilspalumbo/axiomatic-validation.
Researcher Affiliation	Collaboration	1University of Wisconsin-Madison 2Colorado State University 3Scale AI 4Carnegie Mellon University.
Pseudocode	Yes	Listing 1: Abstraction operators for the first block in Python pseudocode.
Open Source Code	Yes	Our implementation is available at https://github. com/nilspalumbo/axiomatic-validation.
Open Datasets	No	We construct a dataset of randomly generated 2-SAT formulas with ten clauses and up to five variables, eliminating syntactic duplicates. We use a solver (Z3 (De Moura & Bjørner, 2008)) to check satisfiability for each formula. We built a balanced dataset of 106 SAT and 106 UNSAT instances; we used 60% (also balanced) for training, and the rest for testing. For model analysis, we construct a separate dataset with 105 SAT and 105 UNSAT instances split as before in to train and test sets.
Dataset Splits	Yes	We built a balanced dataset of 106 SAT and 106 UNSAT instances; we used 60% (also balanced) for training, and the rest for testing. For model analysis, we construct a separate dataset with 105 SAT and 105 UNSAT instances split as before in to train and test sets.
Hardware Specification	Yes	All our experiments were run on an NVIDIA A100 GPU.
Software Dependencies	Yes	We use a solver (Z3 (De Moura & Bjørner, 2008)) to check satisfiability for each formula.
Experiment Setup	Yes	We use a two-layer Re LU decoder-only Transformer with token embeddings of dimension d = 128, learned positional embeddings, one attention head of dimension d in the first layer, four attention heads of dimension d/4 = 32 in the second, and n = 512 hidden units in the MLP (multi-layer perceptron). We use full-batch gradient descent with the Adam W optimizer, setting the learning rate γ = 0.001 and weight decay parameter λ = 1. We perform extensive epochs of training (1000 epochs).