Can Generative Models Improve Self-Supervised Representation Learning?
Authors: Sana Ayromlou, Vahid Reza Khazaie, Fereshteh Forghani, Arash Afkanpour
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experimental results on various joint-embedding SSL techniques demonstrate that our framework significantly enhances the quality of learned visual representations by up to 10% Top-1 accuracy in downstream tasks. |
| Researcher Affiliation | Collaboration | Sana Ayromlou1, Vahid Reza Khazaie1, Fereshteh Forghani2*, Arash Afkanpour1 1Vector Institute 2York University EMAIL EMAIL EMAIL EMAIL |
| Pseudocode | No | The paper describes methods using mathematical equations and descriptions, but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | The code to reproduce our empirical study is available at https://github.com/Vector Institute/Generative SSL. |
| Open Datasets | Yes | Our downstream tasks are image classification on the Image Net (Deng et al. 2009), Food 101 (Bossard, Guillaumin, and Van Gool 2014), Places 365 (Zhou et al. 2017), i Naturalist 2018 (Van Horn et al. 2018), CIFAR 10, and CIFAR 100 (Krizhevsky and Hinton 2009) datasets. |
| Dataset Splits | Yes | We pretrain a Res Net50 encoder using these SSL techniques for 100 epochs on the Image Net training split. For evaluation, we follow the linear probing protocol of previous works by training a linear classifier on the output of the frozen encoder. Our downstream tasks are image classification on the Image Net (Deng et al. 2009), Food 101 (Bossard, Guillaumin, and Van Gool 2014), Places 365 (Zhou et al. 2017), i Naturalist 2018 (Van Horn et al. 2018), CIFAR 10, and CIFAR 100 (Krizhevsky and Hinton 2009) datasets. For all datasets, except for Places 365, a linear classifier is trained on the corresponding training split for 100 epochs. For Places 365, training is performed for 45 epochs. After training the linear classifier, we evaluate the classifier on the corresponding validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or processor types used for the experiments. |
| Software Dependencies | No | While the paper mentions "Utilizing the Solo-learn library (Da Costa et al. 2022)", it does not provide specific version numbers for Solo-learn or any other software dependencies. |
| Experiment Setup | Yes | We pretrain a Res Net50 encoder using these SSL techniques for 100 epochs on the Image Net training split. ... For all datasets, except for Places 365, a linear classifier is trained on the corresponding training split for 100 epochs. For Places 365, training is performed for 45 epochs. ... the sequence of transformations to create each view is as follows: (1) random crop with the relative crop area selected randomly from [0.2, 1], (2) scale to 224 224, (3) color jitter, (4) grayscale, (5) Gaussian blur, (6) horizontal flip. Each augmentation is applied stochastically with a probability value. ... For this experiment we used Stable Diffusion for synthetic image generation and applied the new augmentation with probability p {0, 0.25, 0.5, 0.75, 1}. ... Based on the results depicted in RQ1, we apply the generative augmentation with probability p = 0.5 to both views. |