Generalization Bounds via Meta-Learned Model Representations: PAC-Bayes and Sample Compression Hypernetworks

Authors: Benjamin Leblanc, Mathieu Bazinet, Nathaniel D’Amours, Alexandre Drouin, Pascal Germain

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now study the performance of models learned using our meta-learning framework as well as the quality of the obtained bounds. Then, we report results on a synthetic meta-learning task (Section 4.2) and two real-world meta-learning tasks (Sections 4.3 and 4.4).
Researcher Affiliation Collaboration 1Département d informatique et de génie logiciel, Université Laval, Québec, Canada 2Service Now Research, Montréal, Canada. Correspondence to: Benjamin Leblanc <EMAIL>.
Pseudocode Yes The following pseudocode depicts our Sample Compression Hypernetworks approach. Algorithm 1 Training of Sample Compression Hypernetworks (with messages) architecture.
Open Source Code Yes The code for all experiments is available at https://github.com/GRAAL-Research/Deep RM.
Open Datasets Yes We first conduct an experiment on the moons 2-D synthetic dataset from Scikit-learn (Pedregosa et al., 2011), which consists of two interleaving half circles with small Gaussian noise... based on augmentations of the MNIST dataset (Le Cun et al., 1998)... Binary MNIST and CIFAR100 Tasks
Dataset Splits Yes We randomly split each dataset into support and query of equal size... the meta-training set consists of 10 tasks of 60 000 training examples, while the meta-test set consists of 20 tasks of 2000 examples... Each training task contains 2000 (1200) examples from the train split of the MNIST (CIFAR100) original task, while each test task contains at most 2000 (200) examples from the test split of the original task.
Hardware Specification Yes The experiments were conducted using an NVIDIA GeForce RTX 2080 Ti graphic card.
Software Dependencies No We used the Adam optimizer (Kingma & Ba, 2015) and trained for at most 200 epochs... We initialized the weights of each module using the Kaiming uniform technique (He et al., 2015).
Experiment Setup Yes Detailed hyperparameters used for each experiment are given in Appendix H. Learning rate: 1e-3, 1e-4; MLP1: [200, 200], [500, 500]; MLP2: [100], [200]; MLP3: [100], [200, 200]; c: 0, 1, 2, 4, 8; |µ|, b: 0, 1, 2, 4, 8, 16, 32, 64, 128.