The Computational Complexity of Circuit Discovery for Inner Interpretability

Authors: Federico Adolfi, Martina G. Vilas, Todd Wareham

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical To address this, we study circuit discovery with classical and parameterized computational complexity theory: (1) we describe a conceptual scaffolding to reason about circuit finding queries in terms of affordances for description, explanation, prediction and control; (2) we formalize a comprehensive set of queries for mechanistic explanation, and propose a formal framework for their analysis; (3) we use it to settle the complexity of many query variants and relaxations of practical interest on multi-layer perceptrons. Our findings reveal a challenging complexity landscape. Many queries are intractable, remain fixed-parameter intractable relative to model/circuit features, and inapproximable under additive, multiplicative, and probabilistic approximation schemes. To navigate this landscape, we prove there exist transformations to tackle some of these hard problems with better-understood heuristics, and prove the tractability or fixed-parameter tractability of more modest queries which retain useful affordances.
Researcher Affiliation Academia Federico Adolfi ESI Neuroscience, Max-Planck Society & University of Bristol EMAIL Martina G. Vilas Department of Computer Science Goethe University Frankfurt EMAIL Todd Wareham Department of Computer Science Memorial University of Newfoundland EMAIL
Pseudocode No The paper defines various problems (e.g., Problem 1. BOUNDED LOCAL SUFFICIENT CIRCUIT (BLSC)) and discusses algorithms (e.g., "We describe an efficient algorithm to compute it..."), but it does not include any structured pseudocode or algorithm blocks in the main text.
Open Source Code No The paper does not contain any explicit statement about releasing source code for the methodology described, nor does it provide any links to a code repository or mention code in supplementary materials.
Open Datasets No The paper is theoretical, focusing on computational complexity analysis. It does not describe or use any specific datasets in its own methodology or experiments, therefore it does not provide access information for any public datasets.
Dataset Splits No The paper is a theoretical work on computational complexity and does not conduct experiments involving datasets. Therefore, there is no information provided regarding dataset splits.
Hardware Specification No The paper is theoretical and does not describe any experimental setup that would require specific hardware. No hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not describe any experimental setup that would require specific software dependencies with version numbers. No such information is provided.
Experiment Setup No The paper is theoretical and focuses on computational complexity analysis. It does not describe any experimental setup, hyperparameters, or system-level training settings.