reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Explaining Bayesian Neural Networks

Authors: Kirill Bykov, Marina MC Höhne, Adelaida Creosteanu, Klaus Robert Muller, Frederick Klauschen, Shinichi Nakajima, Marius Kloft

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Quantitative and qualitative experiments on toy and benchmark data, as well as on a real-world pathology dataset, illustrate that our framework enriches standard explanations with uncertainty information and may support the visualization of explanation stability. The paper includes sections like '4 Evaluation procedure' and '5 Experiments' detailing empirical studies and presenting results in tables and figures.
Researcher Affiliation	Collaboration	The authors list affiliations with universities and research institutes (e.g., TU Berlin, ATB Potsdam, University of Potsdam, Korea University, Max Planck Institute for Informatics, Charité Universitätsmedizin, RPTU, BIFOLD, RIKEN AIP) which are academic. However, Adelaida Creosteanu is affiliated with 'Aignostics, Berlin, Germany', which is an industry company. This mix indicates a collaborative affiliation.
Pseudocode	No	The paper describes methods and procedures in narrative text, such as in Section 3, 'Explaining Bayesian Neural Networks,' and Appendix A.2, 'Details of the clustering procedure,' without presenting them in structured pseudocode or algorithm blocks.
Open Source Code	Yes	In the introduction, footnote 1 states: 'Code and reproduction instructions are available at https://github.com/lapalap/explaining-bnn.'
Open Datasets	Yes	The paper references and uses several publicly available datasets, including 'Fashion MNIST (Xiao et al., 2017)', 'Custom MNIST (CMNIST) dataset (Bykov et al., 2021)', 'Image Net (Russakovsky et al., 2015)', and 'Pascal VOC 2007'. These are well-known benchmarks with provided citations.
Dataset Splits	Yes	For the medical data use case, the paper states: 'Data are split into training and test partitions. The training set contains 17,884 labeled patches, with 69.9% non-cancer.' This provides specific information about the dataset partitioning.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running its experiments. It only mentions a '3DHistech P1000 whole-slide scanner' in the context of data acquisition, which is not the hardware used for computation.
Software Dependencies	No	While the paper mentions the use of frameworks like PyTorch (implicitly through references to 'torch.optim' and 'torch.nn') and scikit-learn ('sklearn.metrics'), it does not specify concrete version numbers for these or any other key software components, which is required for reproducibility.
Experiment Setup	Yes	The paper provides extensive details regarding the experimental setup across multiple experiments. For example, for the CMNIST experiment, it states: 'For the Dropout scenario, after 2 Average Pooling layers, a 2-d Dropout layer was inserted with a dropout probability set to 0.25. 2 1D Dropout layers were added in the classiﬁcation part of the network, with the probability of dropout set to 0.5. All of the networks were trained with a batch size of 32, and with a Stochastic Gradient Descent algorithm, (Léon, 1998) with a learning rate of 0.01 and 0.9 momentum. A learning rate scheduler was used with the number of steps set to 7 and multiplicative parameter γ = 0.1. For the Ensemble scenario, 100 networks were trained for 20 epochs, and for the Laplace and Dropout, the number of epochs was set to 100. For the Laplace approximation, KFAC Laplace approximation was used (Humt et al., 2020; Lee et al., 2020), with Laplace regularization hyperparameters (additive and multiplicative) both set to 0.1.'