reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Rigorous Study Of The Deep Taylor Decomposition

Authors: Leon Sixt, Tim Landgraf

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In an empirical evaluation (Section 5), we applied the theoretical insights from the previous section and studied the train-free DTD approximation in several experiments: (C5) The train-free DTD does not enforce the root points to be located in the valid local linear region of the network; (C6) We also validated this empirically using a small multilayered perceptron, where we found a substantial number of samples having roots located outside the valid local linear region; (C7) Additionally, we include a reproducibility study of (Arras et al., 2022) that claimed that DTD s explanations would not suffer from the problems reported in Sixt et al. (2020). This reproducibility study also highlights DTD s black-box character and how difficult it is to evaluate explanation quality empirically.
Researcher Affiliation	Academia	Leon Sixt EMAIL Department of Computer Science Freie Universität Berlin Tim Landgraf EMAIL Department of Computer Science Freie Universität Berlin
Pseudocode	Yes	An exemplary pseudo-code can be found in Algorithm 1. Before continuing with the approximations of the Deep Taylor Decomposition, we want to make a few remarks:
Open Source Code	Yes	Code Repository: https: // github. com/ berleon/ A-Rigorous-Study-Of-The-Deep-Taylor-Decomposition
Open Datasets	Yes	A recent work (Arras et al., 2022) evaluated different saliency methods on the CLEVR VQA dataset using ground truth segmentation masks. Interestingly, they found LRPα1β0 (equivalent to the DTD z -rule) to highlight the object of interest particularly well: [...] a high connection between the relevance heatmaps and the target objects of each question . This finding seems to contradict Sixt et al. (2020), which found that LRPα1β0 becomes independent of the network s deeper layer. In Arras et al. (2022), it was therefore concluded: Maybe the phenomenon described in (Sixt et al., 2020) becomes predominant in the asymptotic case of a neural network with a very high number of layers, [...] . A simple empirical test would have been to check if LRPα1β0 s saliency maps change when the network s last layer is changed. To perform this test, we replicated their setup and trained a relation network (Santoro et al., 2017) on the CLEVR V1.0 dataset Johnson et al. (2017). The network reached an accuracy of 93.47%, comparable to 93.3% (Arras et al., 2022) and 95.5% (Santoro et al., 2017). We included more details about the model in Appendix B.
Dataset Splits	No	The paper uses the CLEVR dataset and mentions evaluating on “1000 LRPα1β0’s saliency maps” but does not provide specific details on how the dataset was split into training, validation, or test sets, such as percentages or absolute counts.
Hardware Specification	No	The computation were done on the Curta cluster provided by the Zedat, Freie Universtität Berlin (Bennett et al., 2020).
Software Dependencies	No	We tested our implementation against Captum s implementation of the DTD (Kokhlikyan et al., 2020) and found the deviation to be less than 1 ˆ 10 8.
Experiment Setup	No	The paper provides some architectural details for a small network (3 linear layers, ReLU, 10 dimensions) and mentions random weight initialization with non-positive biases. For the CLEVR dataset, it states a relation network was trained to 93.47% accuracy. However, it does not provide concrete hyperparameter values such as learning rate, batch size, number of epochs, or specific optimizer settings for either experiment.