Debiasing Mini-Batch Quadratics for Applications in Deep Learning
Authors: Lukas Nicola Tatzel, Bálint Mucsányi, Osane Hackel, Philipp Hennig
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we (i) show that this bias introduces a systematic error, (ii) provide a theoretical explanation for it, (iii) explain its relevance for second-order optimization and uncertainty quantification via the Laplace approximation in deep learning, and (iv) develop and evaluate debiasing strategies. [...] 6 EXPERIMENTS In this section, we evaluate the effectiveness of the debiasing strategies from Section 4.2. |
| Researcher Affiliation | Academia | Lukas Tatzel, Bálint Mucsányi, Osane Hackel & Philipp Hennig Tübingen AI Center University of Tübingen Tübingen, Germany EMAIL |
| Pseudocode | Yes | Algorithm 1: Method of conjugate gradients (CG), based on (Nocedal & Wright, 2006, Alg. 5.2) |
| Open Source Code | No | The paper discusses using existing open-source tools like DEEPOBS (Schneider et al., 2019), PYTORCH (Paszke et al., 2019), and BACKPACK (Dangel et al., 2020), but it does not provide an explicit statement or link for the authors' own implementation code for the methodology described. |
| Open Datasets | Yes | We use the datasets CIFAR-10 (with C 10 classes) and CIFAR-100 (with C 100 classes) (Krizhevsky, 2009). [...] We also use the IMAGENET dataset (Deng et al., 2009) which contains images from C 1000 different classes. |
| Dataset Splits | Yes | Each dataset contains 60,000 data points that are split into 40,000 training samples, 10,000 validation samples and 10,000 test samples. For the experiments on out-of-distribution (OOD) data, we create the datasets CIFAR-10-C and CIFAR-100-C, each containing 10,000 images, as described in (Hendrycks & Dietterich, 2019). |
| Hardware Specification | No | The paper describes various experimental setups, models, and training procedures, but it does not specify any particular hardware components such as GPU models, CPU types, or memory amounts used for running the experiments. |
| Software Dependencies | No | We use DEEPOBS (Schneider et al., 2019) on top of PYTORCH (Paszke et al., 2019) as our general benchmarking framework as it provides easy access to a variety of datasets and model architectures. [...] Our implementation uses BACKPACK (Dangel et al., 2020) that provides access to products with the Hessian... |
| Experiment Setup | Yes | (A) ALL-CNN-C on CIFAR-100. [...] We train the model with SGD (learning rate 0.171234) with batch size 256 for 350 epochs. Weight decay β 0.0005 is used on the weights but not the biases of the model. (B) ALL-CNN-C on CIFAR-10. [...] The model is trained with SGD (learning rate 0.025, momentum 0.9) with batch size 256 for 350 epochs. The learning rate is reduced by a factor of 10 at epochs 200, 250 and 300, as suggested in (Springenberg et al., 2015). Weight decay β 0.001 is used on the weights but not the biases of the model. |