Label Smoothing is a Pragmatic Information Bottleneck

Authors: Sota Kudo

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Under the assumption of sufficient model flexibility and no conflicting labels for the same input, we theoretically and experimentally demonstrate that the model output obtained through label smoothing explores the optimal solution of the information bottleneck. Based on this, label smoothing can be interpreted as a practical approach to the information bottleneck, enabling simple implementation. As an information bottleneck method, we experimentally show that label smoothing also exhibits the property of being insensitive to factors that do not contain information about the target, or to factors that provide no additional information about it when conditioned on another variable.
Researcher Affiliation Academia Sota Kudo EMAIL Nara Institute of Science and Technology
Pseudocode No The paper describes methods using mathematical equations and textual descriptions, without providing any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement or a link indicating that the source code for the methodology described is publicly available.
Open Datasets Yes We conduct experiments with four datasets: CIFAR-10 (Krizhevsky et al., 2009), Flowers-102 (Nilsback & Zisserman, 2008), Occluded CIFAR (Achille & Soatto, 2018), and a variant of Cluttered MNIST (Mnih et al., 2014). Our learning setup is based on established standard settings. [...] We use the CIFAR-10H dataset (Peterson et al., 2019), which provides multiple labels for each of the 10,000 images in the CIFAR-10 test set.
Dataset Splits Yes We conduct experiments with four datasets: CIFAR-10 (Krizhevsky et al., 2009), Flowers-102 (Nilsback & Zisserman, 2008), Occluded CIFAR (Achille & Soatto, 2018), and a variant of Cluttered MNIST (Mnih et al., 2014). Our learning setup is based on established standard settings. [...] For the CIFAR-10 and Occluded CIFAR datasets, we adopt a standard training setup with Res Net (He et al., 2015) [...] We use the CIFAR-10H dataset (Peterson et al., 2019), which provides multiple labels for each of the 10,000 images in the CIFAR-10 test set.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions several software components like Res Net, SGD with momentum, and Adam optimizer, but does not provide specific version numbers for any of them.
Experiment Setup Yes Our learning setup is based on established standard settings. However, to investigate the effects of label smoothing, we have not employed certain other regularization techniques that could potentially interfere with this objective. For the CIFAR-10 and Occluded CIFAR datasets, we adopt a standard training setup with Res Net (He et al., 2015), while weight decay is removed. The training lasts for 160 epochs, with an initial learning rate of 0.1, which is multiplied by 0.1 at epochs 80 and 120. The model architecture is Res Net-56, and the optimizer is SGD with momentum 0.9. The setup for Cluttered MNIST is the same as above, but the architecture used is Res Net-20. For the Flowers-102 dataset, we adopt the training settings used in Hassani et al. (2021), while removing auto-augmentation, mixup, and cutmix. The model used is the Compact Convolutional Transformer (CCT-7/7x2) (Hassani et al., 2021).