Generalization Bounds via Meta-Learned Model Representations: PAC-Bayes and Sample Compression Hypernetworks
Authors: Benjamin Leblanc, Mathieu Bazinet, Nathaniel D’Amours, Alexandre Drouin, Pascal Germain
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now study the performance of models learned using our meta-learning framework as well as the quality of the obtained bounds. Then, we report results on a synthetic meta-learning task (Section 4.2) and two real-world meta-learning tasks (Sections 4.3 and 4.4). |
| Researcher Affiliation | Collaboration | 1Département d informatique et de génie logiciel, Université Laval, Québec, Canada 2Service Now Research, Montréal, Canada. Correspondence to: Benjamin Leblanc <EMAIL>. |
| Pseudocode | Yes | The following pseudocode depicts our Sample Compression Hypernetworks approach. Algorithm 1 Training of Sample Compression Hypernetworks (with messages) architecture. |
| Open Source Code | Yes | The code for all experiments is available at https://github.com/GRAAL-Research/Deep RM. |
| Open Datasets | Yes | We first conduct an experiment on the moons 2-D synthetic dataset from Scikit-learn (Pedregosa et al., 2011), which consists of two interleaving half circles with small Gaussian noise... based on augmentations of the MNIST dataset (Le Cun et al., 1998)... Binary MNIST and CIFAR100 Tasks |
| Dataset Splits | Yes | We randomly split each dataset into support and query of equal size... the meta-training set consists of 10 tasks of 60 000 training examples, while the meta-test set consists of 20 tasks of 2000 examples... Each training task contains 2000 (1200) examples from the train split of the MNIST (CIFAR100) original task, while each test task contains at most 2000 (200) examples from the test split of the original task. |
| Hardware Specification | Yes | The experiments were conducted using an NVIDIA GeForce RTX 2080 Ti graphic card. |
| Software Dependencies | No | We used the Adam optimizer (Kingma & Ba, 2015) and trained for at most 200 epochs... We initialized the weights of each module using the Kaiming uniform technique (He et al., 2015). |
| Experiment Setup | Yes | Detailed hyperparameters used for each experiment are given in Appendix H. Learning rate: 1e-3, 1e-4; MLP1: [200, 200], [500, 500]; MLP2: [100], [200]; MLP3: [100], [200, 200]; c: 0, 1, 2, 4, 8; |µ|, b: 0, 1, 2, 4, 8, 16, 32, 64, 128. |