Improving Generalization of Complex Models under Unbounded Loss Using PAC-Bayes Bounds

Authors: Xitong Zhang, Avrajit Ghosh, Guangliang Liu, Rongrong Wang

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our comprehensive evaluations across various classification tasks and neural network architectures demonstrate that the proposed method not only outperforms existing PAC-Bayes training algorithms but also approximately matches the test accuracy of ERM that is optimized by SGD/Adam using various regularization methods with optimal hyperparameters. Our comprehensive evaluations across various classification tasks and neural network architectures demonstrate that the proposed method not only outperforms existing PAC-Bayes training algorithms but also approximately matches the test accuracy of ERM that is optimized by SGD/Adam using various regularization methods with optimal hyperparameters. We provide mathematical analysis to support the proposed algorithm and conduct extensive numerical experiments to demonstrate its effectiveness. In this section, we demonstrate the efficacy of the proposed PAC-Bays training algorithm through extensive numerical experiments.
Researcher Affiliation Academia Xitong Zhang EMAIL Department of Computational Mathematics, Science and Engineering Michigan State University Avrajit Ghosh EMAIL Department of Computational Mathematics, Science and Engineering Michigan State University Guangliang Liu EMAIL Department of Computer Science and Engineering Michigan State University Rongrong Wang EMAIL Department of Computational Mathematics, Science and Engineering Department of Mathematics Michigan State University
Pseudocode Yes Algorithm 1 PAC-Bayes training (scalar prior) ... Algorithm 2 Compute K(λ) given a set of query priors ... Algorithm 3 PAC-Bayes training (layerwise prior)
Open Source Code No The paper does not contain any explicit statement about releasing source code or a link to a code repository. The only link provided is to an OpenReview forum, which is not a code repository.
Open Datasets Yes We tested our PAC-Bayes training on CIFAR10 and CIFAR100 datasets... We evaluated it on graph neural networks (GNNs)... on 5 benchmark datasets Cora ML, Citeseer, Pub Med, Cora and DBLP (Bojchevski & Günnemann, 2017). We conducted experiments on two text classification tasks of the GLUE benchmark as shown in Table 6. SST is the sentiment analysis task... QNLI (Question-answering Natural Language Inference)...
Dataset Splits Yes We follow the convention for graph datasets by randomly assigning 20 nodes per class for training, 500 for validation, and the remaining for testing. To simulate a few-shot learning scenario, we randomly sample 100 instances from the original training set and take the whole development set to evaluate the classification performance. We split the training set into 5 splits, taking one split as the validation data and the rest as the training set.
Hardware Specification Yes We conducted experiments using eight A5000 GPUs with four AMD EPYC 7543 32-core Processors.
Software Dependencies No The paper mentions general tools like Adam and SGD optimizers and model architectures like BERT, but it does not specify any software libraries or frameworks with their version numbers (e.g., PyTorch 1.9, TensorFlow 2.x, Python 3.8).
Experiment Setup Yes The batch size is 250 for all methods. The batch size was set to be 128. For all convolutional neural networks, our method employed Adam with a fixed learning rate of 1e-4. We conducted a hyperparameter search over learning rate (1e-3 to 1e-2), weight decay (0 to 1e-2), noise injection (0 to 1e-2), and dropout (0 to 0.8) and reported the highest test accuracy as the baseline result. For our method, we used Adam and fixed the learning rate to be 1e-2 for all graph neural networks. The number of filters per layer is 32 in GCN... For GAT... the number of filters is 8 per layer, the number of heads is 8, and the dropout rate of the attention coefficient is 0.6. Fpr APPNP... the number of filters is 32, K = 10 and α = 0.1. We set the number of layers to 2. The learning rate and batch size of our method are set to 1e-3 and 100 (i.e., full-batch), respectively.