A Rigorous Study Of The Deep Taylor Decomposition

Authors: Leon Sixt, Tim Landgraf

TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In an empirical evaluation (Section 5), we applied the theoretical insights from the previous section and studied the train-free DTD approximation in several experiments: (C5) The train-free DTD does not enforce the root points to be located in the valid local linear region of the network; (C6) We also validated this empirically using a small multilayered perceptron, where we found a substantial number of samples having roots located outside the valid local linear region; (C7) Additionally, we include a reproducibility study of (Arras et al., 2022) that claimed that DTD s explanations would not suffer from the problems reported in Sixt et al. (2020). This reproducibility study also highlights DTD s black-box character and how difficult it is to evaluate explanation quality empirically.
Researcher Affiliation Academia Leon Sixt EMAIL Department of Computer Science Freie Universität Berlin Tim Landgraf EMAIL Department of Computer Science Freie Universität Berlin
Pseudocode Yes An exemplary pseudo-code can be found in Algorithm 1. Before continuing with the approximations of the Deep Taylor Decomposition, we want to make a few remarks:
Open Source Code Yes Code Repository: https: // github. com/ berleon/ A-Rigorous-Study-Of-The-Deep-Taylor-Decomposition
Open Datasets Yes A recent work (Arras et al., 2022) evaluated different saliency methods on the CLEVR VQA dataset using ground truth segmentation masks. Interestingly, they found LRPα1β0 (equivalent to the DTD z -rule) to highlight the object of interest particularly well: [...] a high connection between the relevance heatmaps and the target objects of each question . This finding seems to contradict Sixt et al. (2020), which found that LRPα1β0 becomes independent of the network s deeper layer. In Arras et al. (2022), it was therefore concluded: Maybe the phenomenon described in (Sixt et al., 2020) becomes predominant in the asymptotic case of a neural network with a very high number of layers, [...] . A simple empirical test would have been to check if LRPα1β0 s saliency maps change when the network s last layer is changed. To perform this test, we replicated their setup and trained a relation network (Santoro et al., 2017) on the CLEVR V1.0 dataset Johnson et al. (2017). The network reached an accuracy of 93.47%, comparable to 93.3% (Arras et al., 2022) and 95.5% (Santoro et al., 2017). We included more details about the model in Appendix B.
Dataset Splits No The paper uses the CLEVR dataset and mentions evaluating on “1000 LRPα1β0’s saliency maps” but does not provide specific details on how the dataset was split into training, validation, or test sets, such as percentages or absolute counts.
Hardware Specification No The computation were done on the Curta cluster provided by the Zedat, Freie Universtität Berlin (Bennett et al., 2020).
Software Dependencies No We tested our implementation against Captum s implementation of the DTD (Kokhlikyan et al., 2020) and found the deviation to be less than 1 ˆ 10 8.
Experiment Setup No The paper provides some architectural details for a small network (3 linear layers, ReLU, 10 dimensions) and mentions random weight initialization with non-positive biases. For the CLEVR dataset, it states a relation network was trained to 93.47% accuracy. However, it does not provide concrete hyperparameter values such as learning rate, batch size, number of epochs, or specific optimizer settings for either experiment.