reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Extracting Interpretable Task-Specific Circuits from Large Language Models for Faster Inference

Authors: Jorge García-Carrasco, Alejandro Maté, Juan Trujillo

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on different tasks and show that the resulting models are (i) considerably smaller, reducing the number of parameters up to 82.77% and (ii) more interpretable, as they focus on the circuit that is used to carry out the specific task, and can therefore be understood using MI techniques.
Researcher Affiliation	Academia	Jorge Garc ıa-Carrasco, Alejandro Mat e, Juan Trujillo Department of Software and Computing Systems, University of Alicante, Spain EMAIL, EMAIL
Pseudocode	Yes	The pseudocode of our approach is presented in Algorithm 1. Essentially, given a LLM fθ and a dataset that elicits the specific task of interest1 (which is split into a patching and validation datasets Da and Dv), our method is able to automatically obtain a pruned model gθ that is able to perform such task. This process will be controlled by several hyperparameters, namely the threshold α, the type of ablation used (either zero or mean ablation) and whether or not to prune MLPs. Algorithm 1: Automatic Task-Specific Circuit Extraction Data: Model fθ, patching dataset Da, validation dataset Dv,evaluation threshold α, ablation scheme, include mlps Result: Pruned model gθ gθ fθ for layer [num layers(fθ), ...0] do
Open Source Code	Yes	The code and data required to reproduce the experiments and figures, as well as the supplementary materials, can be found in https://github.com/jgcarrasco/circuit-extraction
Open Datasets	No	Given a dataset that elicits the specific task of interest1 (which is split into a patching and validation datasets Da and Dv) and "1Refer to Appendix A for a further discussion on the nature and curation of this dataset." The main paper does not provide concrete access information for the specific dataset used.
Dataset Splits	No	Given a dataset that elicits the specific task of interest1 (which is split into a patching and validation datasets Da and Dv). While splits are mentioned, no specific ratios, sample counts, or detailed splitting methodology are provided to reproduce the data partitioning.
Hardware Specification	Yes	The experiments were performed on a RTX4090 GPU, on an estimated total of 72 hours of compute.
Software Dependencies	No	Our method is implemented on Py Torch (Paszke et al. 2019) by using the Transformer Lens (Nanda and Bloom 2022) and Hugging Face transformer (Wolf et al. 2020) libraries. This lists software components but does not provide specific version numbers for reproducibility.
Experiment Setup	Yes	This process will be controlled by several hyperparameters, namely the threshold α, the type of ablation used (either zero or mean ablation) and whether or not to prune MLPs. The thresholds were selected according to the results of the previous section, and mean ablation is used across all runs. For the baseline comparison, 'The model is trained by minimizing Ldistill for a total of 20000 epochs with the Adam optimizer (Kingma 2014) and a learning rate of 10 3.'