MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments
Authors: Spyros Gidaris, Andrei Bursuc, Oriane Siméoni, Antonín Vobecký, Nikos Komodakis, Matthieu Cord, Patrick Perez
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments Setup. We evaluate our MOCA method by training Vi T-B/16 models on the Image Net-1k (Russakovsky et al., 2015) dataset. ... We evaluate the learned representations on the k-NN Image Net classification task. ... In Tab. 5a, we compare our method MOCA against other self-supervised methods... We further evaluate our method on the Cityscapes (Cordts et al., 2016) semantic segmentation dataset... We present results on COCO detection and instance segmentation in Tab. 7. |
| Researcher Affiliation | Collaboration | Spyros Gidaris1, Andrei Bursuc1, Oriane Siméoni1, Antonin Vobecky1,2,3 Nikos Komodakis4,5,6, Matthieu Cord1, Patrick Pérez1 1Valeo.ai 2Czech Institute of Informatics, Robotics and Cybernetics at the Czech Technical University in Prague 3Czech Technical University in Prague,Faculty of Electrical Engineering 4University of Crete 5IACM-Forth 6Archimedes/Athena RC Correspondance: EMAIL |
| Pseudocode | Yes | C.4 Image augmentations pseudo-code Here we provide Py Torch pseudo-code for the image augmentations used in MOCA for generating the two unmasked random views x1 and x2. import torchvision.transforms as T normalize = T.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)) aug_view1 = T.Compose([...]) aug_view2 = T.Compose([...]) |
| Open Source Code | Yes | We provide the implementation code at https://github.com/valeoai/MOCA. |
| Open Datasets | Yes | We evaluate our MOCA method by training Vi T-B/16 models on the Image Net-1k (Russakovsky et al., 2015) dataset. ... We further evaluate our method on the Cityscapes (Cordts et al., 2016) semantic segmentation dataset... We use the COCO 2017 set consisting of 118K training images and 5k validation. |
| Dataset Splits | Yes | We train using the full Cityscapes training set of 2975 images as well as 100 or 374 training images, representing 1/30 and 1/8 of the full training set. For these 100 and 374 low-shot settings, we use three different splits of 100 or 374 training images respectively following the protocol of French et al. (2020) and report the average m Io U performance over the three splits. ... We use the COCO 2017 set consisting of 118K training images and 5k validation. ... Low-shot Image Net-1k classification. Here we adopt the low-shot evaluation protocol of MSN (Assran et al., 2022) and use as few as 1, 2, or 5 training images per class as well as using 1% of the Image Net-1k s training data, which corresponds to 13 images per class. |
| Hardware Specification | Yes | The batch size is 2048 split over 8 A100 GPUs. ... Time and Memory : per epoch training time and GPU memory footprint measured with a single 8-A100 node and batch size 2048. |
| Software Dependencies | No | The Py Torch pseudo-code for this augmentation stategy is provided in Appendix C.4. ... For the logistic regression, we use the cyanure package (Mairal, 2019). (Does not provide specific version numbers for PyTorch or cyanure.) |
| Experiment Setup | Yes | Setup. We evaluate our MOCA method by training Vi T-B/16 models on the Image Net-1k (Russakovsky et al., 2015) dataset. We use the Adam W optimizer (Loshchilov & Hutter, 2019) with β1 = 0.9, β2 = 0.999 and weight decay 0.05. The batch size is 2048 split over 8 A100 GPUs. For the learning rate lr, we use a linear warm-up from 0 to its peak value for 30 epochs and then decrease it over the remaining epochs with a cosine annealing schedule. The peak lr is 1.5 10 4 and the number of training epochs is 100 or 200. More implementation details are provided in the appendix. In Tab. 11a we provide the implementation details for the Vi T-B/16-based MOCA model that we used for producing the results of Sec. 4.4 in the main paper and in Tab. 11b the optimization setting for its training. |