reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Unbiased Calibration using Meta-Regularization

Authors: Cheng Wang, Jacek Golebiowski

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the effectiveness of the proposed approach in regularizing neural networks towards improved and unbiased calibration on three computer vision datasets. We empirically demonstrate that: (a) learning sample-wise γ as continuous variables can effectively improve calibration; (b) SECE smoothly optimizes γ-Net towards unbiased and robust calibration with respect to the binning schemes; and (c) the combination of γ-Net and SECE achieves the best calibration performance across various calibration metrics while retaining very competitive predictive performance as compared to multiple recently proposed methods.
Researcher Affiliation	Industry	Cheng Wang EMAIL Amazon Berlin, Germany Jacek Golebiowski EMAIL Amazon Berlin, Germany
Pseudocode	Yes	Algorithm 1 in Appendix describes the learning procedures.
Open Source Code	No	We implemented our methods by adapting and extending the code from (Bohdal et al., 2021) with Pytorch (Paszke et al., 2019).
Open Datasets	Yes	We conducted our experiments on CIFAR-10 and CIFAR-100 (in (Bohdal et al., 2021)) as well as Tiny-Image Net (Ya Le, 2015).
Dataset Splits	Yes	For meta-learning, we split the training set into 8:1:1 as training/val/metavalidation, keeping the original test sets untouched.
Hardware Specification	No	The paper mentions using Res Net18 as base model, which is an architecture, but does not specify any hardware details like GPU model, CPU type, or memory.
Software Dependencies	No	We implemented our methods by adapting and extending the code from (Bohdal et al., 2021) with Pytorch (Paszke et al., 2019). While PyTorch is mentioned, a specific version number is not provided.
Experiment Setup	Yes	For all experiments we used their default settings (using Res Net18 as base model, batch size 128, data augmented with random crop and horizontal flip) unless otherwise stated. Each experiment was run 5 times with different random seeds, and results were averaged. ... The models were trained with SGD (learning rate 0.1, momentum 0.9, weight decay 0.0005) for up to 350 epochs. The learning rate was decreased at 150 and 250 epochs by a factor of 10. ... The hidden dimension is set to 512, the temperature τ is fixed at 0.01. For SECE, we used the Gaussian kernel with bandwidth of 0.01 (selected via grid search) for both datasets. We initialized γ = 1.0.