IBCircuit: Towards Holistic Circuit Discovery with Information Bottleneck

Authors: Tian Bian, Yifan Niu, Chaohao Yuan, Chengzhi Piao, Bingzhe Wu, Long-Kai Huang, Yu Rong, Tingyang Xu, Hong Cheng, Jia Li

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted experiments in the Indirect Object Identification (IOI) and Greater-Than tasks, verifying that IBCircuit can identify more faithful and minimal circuits in terms of selecting critical node and edge components compared to baseline methods. In this section, we conduct the evaluation to answer the following research questions (RQs): RQ1-Grounded in Previous Work: Can IBCircuit effectively reproduce circuits taken from previous works that found the circuit explaining behavior for tasks? RQ2-Ablation Study: Are both KL loss and MI loss used for training IBCircuit necessary? How does the different α affect the effectiveness of IBCircuit? What is the contribution of each component in the IBCircuit? RQ3-Faithfulness & Minimality: Does IBCircuit avoid including components that do not participate in the behavior while maintaining better faithfulness? RQ4-Scalability to Large Models: Does IBCircuit have the scalability to be applied on large models?
Researcher Affiliation Collaboration 1Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, China 2DAMO Academy, Alibaba Group, Hangzhou, China 3Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China 4Department of Computer Science, Hong Kong Baptist University, Hong Kong, China 5School of Artificial Intelligence, Shenzhen University, Shenzhen, China 6Tencent, Shenzhen, China 7Hupan Lab, Hangzhou, China.
Pseudocode No The paper describes methods in prose and mathematical equations (e.g., Section 4 'IBCircuit' and Section A 'Analysis of IBCircuit') but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/ivanniu/IBCircuit.
Open Datasets Yes Tasks We primarily focus on GPT-2 (Radford et al., 2019) for better evaluation, as it is a classical model typically studied from a circuit s perspective. We intentionally choose two tasks (IOI and Greater-Than) that have been studied before for fair comparison with previous work. Indirect Object Identification (IOI) (Wang et al., 2022): ... Greater-Than (Hanna et al., 2024): ...
Dataset Splits No The paper discusses tasks like IOI and Greater-Than and uses 'randomly selected activations from the corrupted dataset for patching' but does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts) for general reproducibility of experiments.
Hardware Specification No The paper does not explicitly mention any specific hardware details such as GPU models, CPU types, or memory configurations used for running the experiments.
Software Dependencies No The paper mentions using the 'Adam optimizer' and generally refers to 'Transformer architecture' and 'language models' but does not specify versions for any key software components or libraries (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes For node-level circuit discovery, we set the learning rate to 0.05, trained for 1300 epochs using the Adam optimizer for IB weights, and set α to 1. For edge-level circuit discovery, we set the learning rate to 0.1, trained for 3000 epochs using the Adam optimizer for IB weights with a learning rate warm-up scheduler (200 warm-up steps), and set α between 0.01 and 1 to balance the sparsity of the circuit and the KL divergence (see the analysis of the parameter α in Section 5.3).