Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors
Authors: Milad Sefidgaran, Abdellatif Zaidi, Piotr Krasnowski
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that our approach outperforms the now popular Variational Information Bottleneck (VIB) method as well as the recent Category-Dependent VIB (CDVIB). We validate the advantages of our generalization-aware regularizer in practice through experiments using various datasets (CIFAR10, CIFAR100, INTEL, and USPS) and encoder architectures (CNN4 and Res Net18). |
| Researcher Affiliation | Collaboration | Paris Research Center, Huawei Technologies France : Université Gustave Eiffel, France EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods using mathematical equations and prose (e.g., Section 4, 4.1, 4.2, Appendix C), but does not contain any distinct 'Pseudocode' or 'Algorithm' blocks with structured, code-like steps. Mathematical formulas are provided for updates and regularizers instead of pseudocode. |
| Open Source Code | Yes | The code used in the experiments is available at https://github.com/Piotr Krasnowski/Gaussian_ Mixture_Priors_for_Representation_Learning. |
| Open Datasets | Yes | Datasets: CIFAR10, CIFAR100, INTEL, and USPS image classification... CIFAR10 (KH 09)... CIFAR100 (KH 09)... USPS (Hul94)3... INTEL4... Footnote 3: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass. html#usps. Footnote 4: https://www.kaggle.com/datasets/puneet6060/intel-image-classification |
| Dataset Splits | No | The paper mentions using a 'training dataset S' and a 'test dataset (known also as ghost dataset) S1' and lists the datasets (CIFAR10, CIFAR100, INTEL, and USPS). However, it does not specify the exact percentages, sample counts, or the methodology used to split these datasets into training, validation, and testing sets for the experiments. |
| Hardware Specification | Yes | The PyTorch library (PGM 19) and a GPU Tesla P100 with CUDA 11.0 were utilized to train our prediction model. |
| Software Dependencies | Yes | The PyTorch library (PGM 19) and a GPU Tesla P100 with CUDA 11.0 were utilized to train our prediction model. |
| Experiment Setup | Yes | For optimization, we used the Adam optimizer (KB15) with parameters β1 0.5 and β2 0.999, an initial learning rate of 10 4, an exponential decay of 0.97, and a batch size of 128. We trained the encoder and decoder models for 200 epochs five times independently... For the Gaussian mixture objective function, we selected M=20 priors for each class category. ... The priors were updated after each training iteration using the procedure in C.3 with a moving average coefficient η1 1e 2 for the priors means µc,m, η2 5e 4 for the priors variances σ2 c,m, and η3 1e 2 for the mixture weights αc,m. Following the approach outlined in (AFDM17), we generated one latent sample per image during training and 12 samples during testing. |