Debiasing Mini-Batch Quadratics for Applications in Deep Learning

Authors: Lukas Nicola Tatzel, Bálint Mucsányi, Osane Hackel, Philipp Hennig

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we (i) show that this bias introduces a systematic error, (ii) provide a theoretical explanation for it, (iii) explain its relevance for second-order optimization and uncertainty quantification via the Laplace approximation in deep learning, and (iv) develop and evaluate debiasing strategies. [...] 6 EXPERIMENTS In this section, we evaluate the effectiveness of the debiasing strategies from Section 4.2.
Researcher Affiliation Academia Lukas Tatzel, Bálint Mucsányi, Osane Hackel & Philipp Hennig Tübingen AI Center University of Tübingen Tübingen, Germany EMAIL
Pseudocode Yes Algorithm 1: Method of conjugate gradients (CG), based on (Nocedal & Wright, 2006, Alg. 5.2)
Open Source Code No The paper discusses using existing open-source tools like DEEPOBS (Schneider et al., 2019), PYTORCH (Paszke et al., 2019), and BACKPACK (Dangel et al., 2020), but it does not provide an explicit statement or link for the authors' own implementation code for the methodology described.
Open Datasets Yes We use the datasets CIFAR-10 (with C 10 classes) and CIFAR-100 (with C 100 classes) (Krizhevsky, 2009). [...] We also use the IMAGENET dataset (Deng et al., 2009) which contains images from C 1000 different classes.
Dataset Splits Yes Each dataset contains 60,000 data points that are split into 40,000 training samples, 10,000 validation samples and 10,000 test samples. For the experiments on out-of-distribution (OOD) data, we create the datasets CIFAR-10-C and CIFAR-100-C, each containing 10,000 images, as described in (Hendrycks & Dietterich, 2019).
Hardware Specification No The paper describes various experimental setups, models, and training procedures, but it does not specify any particular hardware components such as GPU models, CPU types, or memory amounts used for running the experiments.
Software Dependencies No We use DEEPOBS (Schneider et al., 2019) on top of PYTORCH (Paszke et al., 2019) as our general benchmarking framework as it provides easy access to a variety of datasets and model architectures. [...] Our implementation uses BACKPACK (Dangel et al., 2020) that provides access to products with the Hessian...
Experiment Setup Yes (A) ALL-CNN-C on CIFAR-100. [...] We train the model with SGD (learning rate 0.171234) with batch size 256 for 350 epochs. Weight decay β 0.0005 is used on the weights but not the biases of the model. (B) ALL-CNN-C on CIFAR-10. [...] The model is trained with SGD (learning rate 0.025, momentum 0.9) with batch size 256 for 350 epochs. The learning rate is reduced by a factor of 10 at epochs 200, 250 and 300, as suggested in (Springenberg et al., 2015). Weight decay β 0.001 is used on the weights but not the biases of the model.