Occam’s Razor for SSL: Memory-Efficient Parametric Instance Discrimination

Authors: Eric Gan, Patrik Reizinger, Alice Bizeul, Attila Juhos, Mark Ibrahim, Randall Balestriero, David Klindt, Wieland Brendel, Baharan Mirzasoleiman

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that DIET (1) can be implemented in a memory-efficient way; (2) achieves competitive performance with state-of-the-art SSL methods on small-scale datasets; and (3) is robust to hyperparameters such as batch size. ... We provide extensive empirical evidence that DIET is competitive on downstream classification with SOTA on small datasets, and it has a higher-rank embedding ( 6). ... In the previous sections, we evaluate DIET and its extended variant, s-DIET, on standard semantic classification benchmarks.
Researcher Affiliation Collaboration Eric Gan EMAIL Computer Science Department, University of California Los Angeles Patrik Reizinger EMAIL Max Planck Institute for Intelligent Systems, Tübingen AI Center, ELLIS Institute, Tübingen, Germany Alice Bizeul EMAIL Department of Computer Science & ETH AI Center, ETH Zürich Attila Juhos EMAIL Max Planck Institute for Intelligent Systems, Tübingen AI Center, ELLIS Institute, Tübingen, Germany Mark Ibrahim EMAIL FAIR, Meta David Klindt EMAIL Cold Spring Harbor Laboratory Randall Balestriero EMAIL Computer Science Department, Brown University Wieland Brendel EMAIL Max Planck Institute for Intelligent Systems, Tübingen AI Center, ELLIS Institute, Tübingen, Germany Baharan Mirzasoleiman EMAIL Computer Science Department, University of California Los Angeles
Pseudocode Yes The simplicity of DIET boils down to (cf. Appx. C.1 for pseudocode): ... Algorithm 1 DIET s algorithm and dataset loader. ... Algorithm 2 Get the output dimension and remove the linear classifier from a given torchvision model (Pytorch used for illustration).
Open Source Code No No explicit statement about the release of source code for the methodology described in this paper or a direct link to a code repository was found. The paper mentions third-party libraries/architectures like 'solo-learn library (da Costa et al., 2022)' and 'https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit_for_small_dataset.py' but not their own implementation of DIET.
Open Datasets Yes For our models, we study the Res Net family of architectures, specifically Res Net-18 and Res Net-50 (He et al., 2016), and vision transformers (Vi T) (Dosovitskiy et al., 2020), specifically Vi T-B/16. ... We perform experiments on a toy, a synthetic, and 4 real-world datasets: CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), Image Net-100 (Tian et al., 2020), and Tiny Image Net (Le & Yang, 2015). ... Aircraft (Maji et al., 2013), DTD (Cimpoi et al., 2014), Pets (Parkhi et al., 2012), Flowers (Nilsback & Zisserman, 2008), CUB200 (Wah et al., 2011), Food101 (Bossard et al., 2014), Cars (Krause et al., 2013). ... We compare SSL methods (DIET, Sim CLR, Mo Cov2, VICReg) trained from scratch on three datasets from the Med MNISTv2 medical imaging benchmark (Yang et al., 2023) (i) Path MNIST (90, 000 7, 180 train/test split); (ii) Derma MNIST (10, 015 2, 005 split); and (iii) Blood MNIST (17, 092 3, 421 split).
Dataset Splits Yes We compare SSL methods (DIET, Sim CLR, Mo Cov2, VICReg) trained from scratch on three datasets from the Med MNISTv2 medical imaging benchmark (Yang et al., 2023) (i) Path MNIST (90, 000 7, 180 train/test split); (ii) Derma MNIST (10, 015 2, 005 split); and (iii) Blood MNIST (17, 092 3, 421 split).
Hardware Specification Yes OOM indicates out-of-memory on an Nvidia A40 GPU. ... Table 4: Training time of s-DIET versus Sim CLR on CIFAR 10/100, in hours, on a single NVIDIA A5000 GPU.
Software Dependencies No The paper mentions 'Pytorch used for illustration' and 'Adam W optimizer' but does not provide specific version numbers for these or other software libraries used in their experiments. It also refers to 'solo-learn library (da Costa et al., 2022)' but without a specific version number of the library itself.
Experiment Setup Yes We use a three-layer Re LU MLP as a projection head during the training of s-DIET. We fix the default label smoothing to 0.8 and the data augmentation pipeline to a combination of cropping, flipping, color jitter and gaussian blurring. Details on data augmentation are presented in Alg. 3. We use an Adam W optimizer with a 10 3 learning rate and 0.05 weight decay with a cosine learning rate decay. We fix the batch size to 256 for all experiments and train DIET and s-DIET for 5000 epochs. Our baselines were trained until convergence using the same data augmentation as for DIET. All baseline hyper-parameters were kept to the default values proposed by the original works. After training, we evaluate our representations by training a linear classifier on top of frozen representations to perform semantic classification on the validation set.