Exactly Tight Information-theoretic Generalization Bounds via Binary Jensen-Shannon Divergence
Authors: Yuxin Dong, Haoran Guo, Tieliang Gong, Wen Wen, Chen Li
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we assess the tightness of our exactly tight Binary JS bound (Corollary 3.10) in comparison to several existing information-theoretic generalization bounds from the literature. These include the Fast-Rate bound (Theorem 4.3, (Wang & Mao, 2023a)), the Binary KL bound (Theorem 5, (Hellstr om & Durisi, 2022b)), and the f-information series of oracle bounds: CMI, CSHI, and CJSI (Theorems 3.1, 3.2, and 3.3, (Wang & Mao, 2024)). Our experimental settings align closely with those in (Wang & Mao, 2024), where we evaluate three distinct classification tasks2: Simple linear classifier on synthetic Gaussian dataset. 4-layer CNN on binarized MNIST (classes 4 vs. 9). Pretrained Res Net-50 model on CIFAR10. [...] The final results, presented in Figure 3, demonstrate that our Binary JS bound fully captures the dynamics of the generalization error. |
| Researcher Affiliation | Academia | 1School of Computer Science and Technology, Xi an Jiaotong University. Correspondence to: Tieliang Gong <EMAIL>. |
| Pseudocode | No | The paper describes methods mathematically and in text, without presenting any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 2https://github.com/Yuxin-Dong/Binary JS. |
| Open Datasets | Yes | Our experimental settings align closely with those in (Wang & Mao, 2024), where we evaluate three distinct classification tasks2: Simple linear classifier on synthetic Gaussian dataset. 4-layer CNN on binarized MNIST (classes 4 vs. 9). Pretrained Res Net-50 model on CIFAR10. |
| Dataset Splits | No | Our synthetic experimental settings closely follow those in (Wang & Mao, 2024), where synthetic Gaussian datasets are generated using the scikit-learn package. The task involves training a 1-layer linear classification network on 5-dimensional input data points. [...] In addition, we replicate the experimental settings of (Harutyunyan et al., 2021; Hellstr om & Durisi, 2022b) for two distinct real-world learning tasks: 1) MNIST (4 vs. 9) classification using a 4-layer CNN network, 2) CIFAR10 classification using a pretrained Res Net-50 network. However, the paper does not explicitly state the specific training/validation/test splits with percentages or counts for these datasets in the main text. |
| Hardware Specification | Yes | The deep learning models are trained with an Intel Xeon CPU (2.10GHz, 48 cores), 256GB memory, and 4 Nvidia Tesla V100 GPUs (32GB). |
| Software Dependencies | No | Our synthetic experimental settings closely follow those in (Wang & Mao, 2024), where synthetic Gaussian datasets are generated using the scikit-learn package. The paper mentions a software package but does not provide specific version numbers for it or any other key software components. |
| Experiment Setup | Yes | The model is trained using full-batch gradient descent with a fixed learning rate of 0.01 for 300 epochs. [...] For each learning task, k1 instances of e Z are sampled, and for each e Z, k2 samples of U are drawn, yielding k1 k2 independent runs in total. The values of (k1, k2) are (5, 30) for MNIST and (2, 40) for CIFAR10, respectively. [...] CNN and Res Net-50 models are trained with mini-batch-based iterative learning algorithms such as SGD and SGLD. |