Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition
Authors: Aliyah Hsu, Georgia Zhou, Yeshwanth Cherapanamjeri, Yaxuan Huang, Anobel Odisho, Peter Carroll, Bin Yu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On three standard circuit evaluation datasets (indirect object identification, greater-than comparisons, and docstring completion), we demonstrate that CD-T outperforms ACDC and EAP by better recovering the manual circuits with an average of 97% ROC AUC under low runtimes. In addition, we provide evidence that faithfulness of CD-T circuits is not due to random chance by showing our circuits are 80% more faithful than random circuits of up to 60% of the original model size. All experiments are conducted on an NVIDIA A100 GPU. |
| Researcher Affiliation | Academia | Aliyah R. Hsu Department of EECS UC Berkeley EMAIL Georgia Zhou Department of EECS UC Berkeley Yeshwanth Cherapanamjeri CSAIL, MIT Yaxuan Huang Department of Statistics UC Berkeley Anobel Y. Odisho & Peter R. Carroll Department of Urology, Epidemiology and Biostatistics UC San Francisco Bin Yu Department of Statistics, EECS Center for Computational Biology UC Berkeley |
| Pseudocode | Yes | Our complete algorithm is described in Algorithm 1, presented in the specific case where we have chosen to decompose our source nodes s so that βs is the activation s deviation from the mean over some distribution. |
| Open Source Code | Yes | All code for using CD-T and reproducing results is made available on Github. 1 1https://github.com/adelaidehsu/CD_Circuit |
| Open Datasets | Yes | Specifically, for evaluation, we use three standard circuit evaluation datasets: indirect object identification (IOI) (Wang et al., 2023), greater-than comparisons (Greater-than) (Hanna et al., 2023), and docstring completion (Docstring) (Heimersheim & Janiak, 2023) (see Appendix A for details). |
| Dataset Splits | Yes | We identify circuits using 25 IOI samples drawn from mixed templates, and mean ablation is conducted using the corrupted ABC dataset. Another set of 100 IOI samples are used in evaluation. We identify circuits using a random sample of 100 datapoints provided by Hanna et al. (2023)... Another set of 100 samples are used in evaluation. We identify circuits using a 100 datapoints sampling for the dataset provided by Heimersheim & Janiak (2023)... Another set of 100 samples are used in evaluation. |
| Hardware Specification | Yes | All experiments are conducted on an NVIDIA A100 GPU. |
| Software Dependencies | No | The paper does not explicitly mention specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | For CD-T, we test by varying the percentile of top nodes to extract in every iteration, in the range of [90, 99]. For our mean-ablation , we simply take the mean over the activations over 100 negative datapoints (impossible completions, with the ending year preceding the starting century), and as above, when setting the decomposition at a source node, define the relevant component to be the deviation from the mean activation over this distribution. We identify circuits using 25 IOI samples drawn from mixed templates, and mean ablation is conducted using the corrupted ABC dataset. |