A Probabilistic Model behind Self- Supervised Learning

Authors: Alice Bizeul, Bernhard Schölkopf, Carl Allen

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Learning representations by fitting the model generatively (termed Sim VAE) improves performance over discriminative and other VAE-based methods on simple image benchmarks and significantly narrows the gap between generative and discriminative representation learning in more complex settings. Overall, our results provide empirical support for the SSL Model as a mathematical basis for self-supervised learning and suggest that SSL methods may overfit to content classification tasks.
Researcher Affiliation Academia Alice Bizeul EMAIL Department of Computer Science & ETH AI Center, ETH Zurich Bernhard Schölkopf EMAIL Max Planck Institute for Intelligent Systems, Tübingen Carl Allen EMAIL Department of Computer Science & ETH AI Center, ETH Zurich
Pseudocode Yes Algorithm 1 Sim VAE Require: data {xi}M i=1; batch size N; data dim D; latent dim L; augmentation set T ; number of augmentations J; encoder fϕ; decoder gθ; variance of z|y, σ2; for randomly sampled mini-batch {xi}N i=1 do for augmentation tj T do xj i = tj(xi); # augment samples µj i, Σj i = fϕ(xj i); # forward pass: z pϕ(z|x) zj i N(µj i, Σj i); xj i = gθ(zj i); # x = E[x|z; θ] end for D PJ j=1 ||xj i xj i||2 2 # minimize loss 2 PJ j=1 log(|Σj i|) Li prior = 1 2 PJ j=1 ||(zj i 1 J PJ j=1 zj i)/σ||2 2 min(PN k=1 Li rec + Li H + Li prior) w.r.t. ϕ, θ by SGD; end for return ϕ, θ;
Open Source Code Yes 1The code to reproduce Sim VAE can be found at https://github.com/alicebizeul/simvae
Open Datasets Yes We evaluate Sim VAE representations on four datasets including two with natural images: MNIST (Le Cun, 1998), Fashion MNIST (Xiao et al., 2017), Celeb A (Liu et al., 2015) and CIFAR10 (Krizhevsky et al., 2009).
Dataset Splits Yes The MNIST dataset (Le Cun, 1998) gathers 60 000 training and 10 000 testing images representing digits from 0 to 9 in various caligraphic styles. ... The Fashion MNIST dataset (Xiao et al., 2017) is a collection of 60 000 training and 10 000 test images ... The Celeb A dataset (Liu et al., 2015) comprises a vast collection of celebrity facial images. It encompasses a diverse set of 183 000 high-resolution images (i.e., 163 000 training and 20 000 test images)
Hardware Specification Yes Models for MNIST, Fashion MNIST and CIFAR10 were trained on a RTX2080ti GPU with 12G RAM. Models for Celeb A were trained on an RTX3090 GPU with 24G RAM.
Software Dependencies No The paper mentions 'Pytorch s dataset collection' and 'Scikit-learn' but does not specify version numbers for these or any other software dependencies.
Experiment Setup Yes The batch size was fixed to 128. ... For VAEs, the learning rate was set to 8e 5, and the likelihood probability, p(x|z), variance parameter was set to 0.02 for β-VAE, CR-VAE and Sim VAE. CR-VAE s λ parameter was set to 0.1. Sim VAE s prior probability, p(z|y), variance was set to 0.15 and the number of augmentations to 10. Vic REG s parameter µ was set to 25 and the learning rate to 1e-4. Sim CLR s temperature parameter, τ, was set to 0.7 ... generative baselines and Sim VAE were trained for 400 epochs while discriminative methods were trained for 600 to 800 epochs. Both probes were trained using an Adam optimizer with a learning rate of 3e 4 for 200 epochs with batch size fixed to 128.