Visualizing the Diversity of Representations Learned by Bayesian Neural Networks

Authors: Dennis Grinwald, Kirill Bykov, Shinichi Nakajima, Marina MC Höhne

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we use four data sets, Places365 (Zhou et al. (2017)), CIFAR-100 (Krizhevsky et al. (2009)), STL-10 (Coates et al. (2011b)), and SVHN (Yuval (2011)), of which some statistics are listed in Table 1. ... Here, we will visually and quantitatively compare the learned representations of models that were trained using different Bayesian inference methods.
Researcher Affiliation Academia Dennis Grinwald EMAIL Machine Learning Group Technical University of Berlin, Berlin, Germany BIFOLD Berlin Institute for the Foundations of Learning and Data, Berlin, Germany Kirill Bykov EMAIL Understandable Machine Intelligence Lab Leibniz Institute for Agriculture and Bioeconomy (ATB), Potsdam, Germany Technical University of Berlin, Berlin, Germany BIFOLD Berlin Institute for the Foundations of Learning and Data, Berlin, Germany Shinichi Nakajima EMAIL Machine Learning Group Technical University of Berlin, Berlin, Germany BIFOLD Berlin Institute for the Foundations of Learning and Data, Berlin, Germany RIKEN Center for AIP, Tokyo, Japan Marina M.-C. Höhne EMAIL Understandable Machine Intelligence Lab Leibniz Institute for Agriculture and Bioeconomy (ATB), Potsdam, Germany Technical University of Berlin, Berlin, Germany BIFOLD Berlin Institute for the Foundations of Learning and Data, Berlin, Germany
Pseudocode No The general idea of AM is to artificially generate an input that maximizes the activation of a particular neuron in a certain layer of a neural network. The optimization problem can be formulated as follows: ˆv = argmax v RV a(v) + R(v), (6). No structured pseudocode or algorithm blocks are present. The paper describes methods using mathematical formulations and textual descriptions.
Open Source Code No No concrete access to custom source code for the methodology described in this paper is provided. The paper references third-party tools' implementations: 'for which we used the Py Torch (Paszke et al., 2019) version of the published source code available at https://github.com/greentfrapp/lucent.' and 'For the contrastive learning, we used the implementation of the following public repository available at https://github.com/Yunfan-Li/Contrastive-Clustering.' These are not the authors' own implementation of their full analysis pipeline.
Open Datasets Yes In our experiments, we use four data sets, Places365 (Zhou et al. (2017)), CIFAR-100 (Krizhevsky et al. (2009)), STL-10 (Coates et al. (2011b)), and SVHN (Yuval (2011)), of which some statistics are listed in Table 1.
Dataset Splits Yes Table 1: Data sets used in the experiments. Data set image size classes Training size Test size Places365 256 256 3 365 1.4mil 365k CIFAR-100 32 32 3 100 50k 10k STL-10 96 96 3 10 5k 8k SVHN 32 32 3 10 73k 26k
Hardware Specification No No specific hardware details (GPU models, CPU models, or memory specifications) are mentioned in the paper for running the experiments. The paper only discusses the models and datasets used.
Software Dependencies No No specific version numbers for key software components are explicitly provided. The paper mentions using 'Py Torch (Paszke et al., 2019)' but does not specify a version number like '1.x' for PyTorch. Other tools like SGD, KMeans, and t-SNE are mentioned without version details.
Experiment Setup Yes We solve the AM optimization problem (6) by 512 steps of gradient descent with the step size of α = 0.05. For regularization, we apply the transformation robustness with random rotation, random scaling, and random jittering. ... The number of augmented samples is M = 2 (per original sample and epoch)... We use stochastic gradient descent (SGD), where the contrastive loss (8) with the batch size N and the temperature τ = 0.5 is minimized in each epoch. We run SGD for 150 epochs on the CIFAR-100, STL-10, and SVHN models, and for 250 epochs on the Places365 models.