Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition

Authors: Aliyah Hsu, Georgia Zhou, Yeshwanth Cherapanamjeri, Yaxuan Huang, Anobel Odisho, Peter Carroll, Bin Yu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On three standard circuit evaluation datasets (indirect object identification, greater-than comparisons, and docstring completion), we demonstrate that CD-T outperforms ACDC and EAP by better recovering the manual circuits with an average of 97% ROC AUC under low runtimes. In addition, we provide evidence that faithfulness of CD-T circuits is not due to random chance by showing our circuits are 80% more faithful than random circuits of up to 60% of the original model size. All experiments are conducted on an NVIDIA A100 GPU.
Researcher Affiliation Academia Aliyah R. Hsu Department of EECS UC Berkeley EMAIL Georgia Zhou Department of EECS UC Berkeley Yeshwanth Cherapanamjeri CSAIL, MIT Yaxuan Huang Department of Statistics UC Berkeley Anobel Y. Odisho & Peter R. Carroll Department of Urology, Epidemiology and Biostatistics UC San Francisco Bin Yu Department of Statistics, EECS Center for Computational Biology UC Berkeley
Pseudocode Yes Our complete algorithm is described in Algorithm 1, presented in the specific case where we have chosen to decompose our source nodes s so that βs is the activation s deviation from the mean over some distribution.
Open Source Code Yes All code for using CD-T and reproducing results is made available on Github. 1 1https://github.com/adelaidehsu/CD_Circuit
Open Datasets Yes Specifically, for evaluation, we use three standard circuit evaluation datasets: indirect object identification (IOI) (Wang et al., 2023), greater-than comparisons (Greater-than) (Hanna et al., 2023), and docstring completion (Docstring) (Heimersheim & Janiak, 2023) (see Appendix A for details).
Dataset Splits Yes We identify circuits using 25 IOI samples drawn from mixed templates, and mean ablation is conducted using the corrupted ABC dataset. Another set of 100 IOI samples are used in evaluation. We identify circuits using a random sample of 100 datapoints provided by Hanna et al. (2023)... Another set of 100 samples are used in evaluation. We identify circuits using a 100 datapoints sampling for the dataset provided by Heimersheim & Janiak (2023)... Another set of 100 samples are used in evaluation.
Hardware Specification Yes All experiments are conducted on an NVIDIA A100 GPU.
Software Dependencies No The paper does not explicitly mention specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes For CD-T, we test by varying the percentile of top nodes to extract in every iteration, in the range of [90, 99]. For our mean-ablation , we simply take the mean over the activations over 100 negative datapoints (impossible completions, with the ending year preceding the starting century), and as above, when setting the decomposition at a source node, define the relevant component to be the deviation from the mean activation over this distribution. We identify circuits using 25 IOI samples drawn from mixed templates, and mean ablation is conducted using the corrupted ABC dataset.