Learning via Wasserstein-Based High Probability Generalisation Bounds
Authors: Paul Viallard, Maxime Haddouche, Umut Simsekli, Benjamin Guedj
NeurIPS 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As a result we derive novel Wasserstein-based PAC-Bayesian learning algorithms and we illustrate their empirical advantage on a variety of experiments. We present in Table 1 the performance of Algorithms 1 and 2 compared to the Empirical Risk Minimisation (ERM) and the Online Gradient Descent (OGD) with the COCOB-Backprop optimiser. |
| Researcher Affiliation | Academia | Paul Viallard Inria, CNRS, Ecole Normale Supérieure, PSL Research University, Paris, France EMAIL Maxime Haddouche Inria, University College London and Université de Lille, France EMAIL Umut Sim sekli Inria, CNRS, Ecole Normale Supérieure PSL Research University, Paris, France EMAIL Benjamin Guedj Inria and University College London, France and UK EMAIL |
| Pseudocode | Yes | Algorithm 1 (Mini-)Batch Learning Algorithm with Wasserstein distances and Algorithm 2 Online Learning Algorithm with Wasserstein distances in Appendix C. |
| Open Source Code | Yes | All the experiments are reproducible with the source code provided on Git Hub at https://github.com/paulviallard/NeurIPS23-PB-Wasserstein. |
| Open Datasets | Yes | We study the performance of Algorithms 1 and 2 on UCI datasets [DG17] along with MNIST [Le C98] and Fashion MNIST [XRV17]. |
| Dataset Splits | No | We also split all the data (from the original training/test set) in two halves; the first part of the data serves in the algorithm (and is considered as a training set), while the second part is used to approximate the population risks Rµ(h) and Cµ (and considered as a testing set). The paper describes splitting data into training and testing sets but does not explicitly mention a separate validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. It only refers to 'models' without specifying the underlying hardware. |
| Software Dependencies | No | The paper mentions using the 'COCOB-Backprop optimiser [OT17]' and implicitly references 'Pytorch' for the multi-margin loss function, but it does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | To perform the gradient steps, we use the COCOB-Backprop optimiser [OT17] (with parameter α = 10000). For Algorithm 1, which solves Equation (5), we fix a batch size of 100, i.e., |U| = 100, and the number of epochs T and T are fixed to perform at least 20000 iterations. Regarding Algorithm 2, which solves Equation (7), we set t = 100 for the log barrier, which is enough to constrain the weights and the number of iterations to T = 10. In the following, we consider D = 600 and L = 2; more experiments are considered in the appendix. We initialise the network similarly to [DR17] by sampling the weights from a Gaussian distribution with zero mean and a standard deviation of σ = 0.04; the weights are further clipped between 2σ and +2σ. Moreover, the values in the biases b1, . . . , b L are set to 0.1, while the values for b are set to 0. |