Extracting Local Reasoning Chains of Deep Neural Networks
Authors: Haiyan Zhao, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang
TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide detailed and insightful case studies together with several quantitative analyses over thousands of trials to demonstrate the quality, sparsity, fidelity and accuracy of the interpretation. In extensive empirical studies on VGG, Res Net, and Vi T, Neuro Chains significantly enriches the interpretation and makes the inner mechanism of DNNs more transparent. |
| Researcher Affiliation | Academia | Haiyan Zhao EMAIL University of Technology Sydney Tianyi Zhou EMAIL University of Maryland Guodong Long EMAIL University of Technology Sydney Jing Jiang EMAIL University of Technology Sydney Chengqi Zhang EMAIL University of Technology Sydney |
| Pseudocode | No | The paper describes the algorithm and optimization problem using mathematical equations and descriptive text, but it does not contain a formally labeled 'Pseudocode' or 'Algorithm' block with structured steps. |
| Open Source Code | No | The paper mentions 'Neuro Chains' as the developed tool but does not contain any explicit statement about releasing source code for this methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | In experiments, we apply Neuro Chains to extract the inference chains of widely-adopted VGG-19, Res Net-50, and Vi T which are all pre-trained on Image Net. |
| Dataset Splits | Yes | In every case study, we firstly randomly sample 2 classes in Image Net and then randomly sample 10 images from each class s images. Note that the sampled images may be wrongly classified to other classes by the original DNN. We apply inference on those 20 images and their outputs are used in solving the optimization of Eq (8) in order to extract the local inference chain in the form of a sub-network. ... The validation-set accuracy of sub-networks vs. the number of training samples per class in each sub-task. For both pre-trained VGG-19 and Res Net-50, when the number of samples per class is too small ( 5), the sub-network tends to be overfitting and cannot generalize well to unseen validation data. However, when the number of samples per class increases to 10, the sub-network starts to achieve promising validation-set accuracy. |
| Hardware Specification | Yes | It costs only 90s for VGG-19 and 55s for Res Net-50 to extract a sub-network on a single RTX 6000 GPU since we only optimize a few number of scores. |
| Software Dependencies | No | We implement Neuro Chains by Py Torch (Paszke et al., 2017). A specific version number for PyTorch is not provided, nor are other software dependencies or their versions. |
| Experiment Setup | Yes | We use Adam optimizer for the optimization of Eq (8) for filter/layer scores. We use a fixed learning rate of 0.005. We set temperature T = 0.2 in sigmoid gate (Eq. (6)) to encourage the value Gℓclose to either 0 or 1, and the threshold τ to goal scores is set to 0.1 so that the outputs of sub-networks are consistent. We only tried a limited number of choices on tens of experiments, and chose the best combination balancing the fidelity and sub-network size, and then applied it to all other experiments without further tuning. In particular, we tried τ {0.01, 0.1, 0.5}, λ {0.001, 0.005, 0.01, 0.1}, and λg {1, 2, 5}. For different models, the weights of two penalties in Eq. (8) are different. For VGG-19, we use λ = 0.005 and λg = 2. While we choose λ = 0.005 and λg = 1 for Res Net-50. This choice performs consistently well and robust on all other experiments. The iteration steps of training is 300 and we stop training when the loss difference is quite small, i.e., less than 0.05. |