Streamlining Prediction in Bayesian Deep Learning
Authors: Rui Li, Marcus Klasson, Arno Solin, Martin Trapp
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We showcase our approach for both MLP and transformers, such as Vi T and GPT-2, and assess its performance on regression and classification tasks. ... Contributions: ... (iii) Finally, we present an empirical assessment of our approach on regression and classification tasks, and showcase its utility for uncertainty quantification, out-of-domain detection, and sensitivity analysis (Sec. 4). ... 4 EXPERIMENTS We demonstrate practical applicability of our approach on classification/regression tasks (Sec. 4.1), large-scale classification results with Vi T/GPT models (Sec. 4.2), and sensitivity estimation (Sec. 4.3). Additional experiments and additional experimental results can be found in App. B. |
| Researcher Affiliation | Academia | Rui Li Marcus Klasson Arno Solin Martin Trapp Department of Computer Science, Aalto University, Finland {firstname.lastname}@aalto.fi |
| Pseudocode | No | The paper describes its methodology using mathematical equations and textual descriptions but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Open-source library: https://github.com/Aalto ML/SUQ. |
| Open Datasets | Yes | Data sets We use a selection of data sets from the UCI repository (Kelly et al., 2023) for the regression experiments. For classification, we experiment on MNIST (Le Cun et al., 1998), FMNIST (Xiao et al., 2017), as well as the 11-class data sets Organ CMNIST and Organ SMNIST from Med MNIST (Yang et al., 2023). ... We experiment with CIFAR-10 and CIFAR-100 (Krizhevsky & Hinton, 2009), DTD (Cimpoi et al., 2014), RESISC (Cheng et al., 2017) and a subsampled version of Image Net-R (Hendrycks et al., 2021) ... For the GPT model, we used the BOOLQ, WIC, and MRPC tasks from GLUE (Wang et al., 2019b) and Super GLUE (Wang et al., 2019a) benchmarks. |
| Dataset Splits | Yes | Regression We experiment on a selection of data sets from the UCI repository and run a 5-fold cross validation to report results for each data set. ... For our method, we fit an additional scaling factor on the predictive variance by minimising the NLPD on a validation set... |
| Hardware Specification | Yes | We acknowledge CSC IT Center for Science, Finland, for awarding this project access to the LUMI supercomputer, owned by the Euro HPC Joint Undertaking, hosted by CSC (Finland) and the LUMI consortium through CSC. ... In B.7 RUNTIME EXPERIMENT ... We ran experiments on an NVIDIA H100 80GB GPU for 400 data points, batchsize of one, and for each data point we repeated the measurement ten times. |
| Software Dependencies | No | The paper mentions several software components like Hugging Face Transformers, torch-laplace library, and IVON, but it does not specify exact version numbers for these dependencies. |
| Experiment Setup | Yes | Posterior approximations ... For the MFVI and LA sampling baselines, we used 1, 000 MC samples in the regression and classification experiments in Sec. 4.1, and 50 MC samples for the Vi T and GPT-2 in Sec. 4.2. ... B.3 IMAGE PIXEL SENSITIVITY We trained a 4 layer MLP classifier on MNIST digits zero and eight using a batch size of 64, learning rate of 1e 3, weight decay set to 1e 5, and for 50 epochs. ... The optimisation was performed for each image independently and using Adam with a learning rate of 5e 3 until the validation loss dropped below a divergence to the initial loss of 1e 2. |