Exactly Tight Information-theoretic Generalization Bounds via Binary Jensen-Shannon Divergence

Authors: Yuxin Dong, Haoran Guo, Tieliang Gong, Wen Wen, Chen Li

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we assess the tightness of our exactly tight Binary JS bound (Corollary 3.10) in comparison to several existing information-theoretic generalization bounds from the literature. These include the Fast-Rate bound (Theorem 4.3, (Wang & Mao, 2023a)), the Binary KL bound (Theorem 5, (Hellstr om & Durisi, 2022b)), and the f-information series of oracle bounds: CMI, CSHI, and CJSI (Theorems 3.1, 3.2, and 3.3, (Wang & Mao, 2024)). Our experimental settings align closely with those in (Wang & Mao, 2024), where we evaluate three distinct classification tasks2: Simple linear classifier on synthetic Gaussian dataset. 4-layer CNN on binarized MNIST (classes 4 vs. 9). Pretrained Res Net-50 model on CIFAR10. [...] The final results, presented in Figure 3, demonstrate that our Binary JS bound fully captures the dynamics of the generalization error.
Researcher Affiliation Academia 1School of Computer Science and Technology, Xi an Jiaotong University. Correspondence to: Tieliang Gong <EMAIL>.
Pseudocode No The paper describes methods mathematically and in text, without presenting any structured pseudocode or algorithm blocks.
Open Source Code Yes 2https://github.com/Yuxin-Dong/Binary JS.
Open Datasets Yes Our experimental settings align closely with those in (Wang & Mao, 2024), where we evaluate three distinct classification tasks2: Simple linear classifier on synthetic Gaussian dataset. 4-layer CNN on binarized MNIST (classes 4 vs. 9). Pretrained Res Net-50 model on CIFAR10.
Dataset Splits No Our synthetic experimental settings closely follow those in (Wang & Mao, 2024), where synthetic Gaussian datasets are generated using the scikit-learn package. The task involves training a 1-layer linear classification network on 5-dimensional input data points. [...] In addition, we replicate the experimental settings of (Harutyunyan et al., 2021; Hellstr om & Durisi, 2022b) for two distinct real-world learning tasks: 1) MNIST (4 vs. 9) classification using a 4-layer CNN network, 2) CIFAR10 classification using a pretrained Res Net-50 network. However, the paper does not explicitly state the specific training/validation/test splits with percentages or counts for these datasets in the main text.
Hardware Specification Yes The deep learning models are trained with an Intel Xeon CPU (2.10GHz, 48 cores), 256GB memory, and 4 Nvidia Tesla V100 GPUs (32GB).
Software Dependencies No Our synthetic experimental settings closely follow those in (Wang & Mao, 2024), where synthetic Gaussian datasets are generated using the scikit-learn package. The paper mentions a software package but does not provide specific version numbers for it or any other key software components.
Experiment Setup Yes The model is trained using full-batch gradient descent with a fixed learning rate of 0.01 for 300 epochs. [...] For each learning task, k1 instances of e Z are sampled, and for each e Z, k2 samples of U are drawn, yielding k1 k2 independent runs in total. The values of (k1, k2) are (5, 30) for MNIST and (2, 40) for CIFAR10, respectively. [...] CNN and Res Net-50 models are trained with mini-batch-based iterative learning algorithms such as SGD and SGLD.