Preventing Dimensional Collapse in Self-Supervised Learning via Orthogonality Regularization
Authors: Junlin He, Jinxiao Du, Wei Ma
NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical investigations demonstrate that OR significantly enhances the performance of SSL methods across diverse benchmarks, yielding consistent gains with both CNNs and Transformer-based architectures. |
| Researcher Affiliation | Academia | Junlin He The Hong Kong Polytechnic University Hong Kong SAR, China EMAIL Jinxiao Du The Hong Kong Polytechnic University Hong Kong SAR, China EMAIL Wei Ma The Hong Kong Polytechnic University Hong Kong SAR, China EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code will be released at https://github.com/Umaruchain/OR_in_SSL.git. |
| Open Datasets | Yes | We pretrain SSL methods on CIFAR-10, CIFAR-100, IMAGENET-100 and IMAGENET-1k and evaluate transfer learning scenarios on datasets including CIFAR-100, CIFA-10 (Krizhevsky et al. 2009), Food-101 (Bossard et al. 2014), Flowers-102 (Xia et al. 2017), DTD (Sharan et al. 2014), GTSRB (Haloi 2015). |
| Dataset Splits | Yes | The splits of the training and test set follow torchvision Marcel & Rodriguez (2010). For OR, γ of SRIP is tuned from {1e 3, 1e 4, 1e 5} and γ of SO is tuned from {1e 5, 1e 6, 1e 7} on a validation set. |
| Hardware Specification | Yes | Our experiments were all completed on 4 3090 GPUs. |
| Software Dependencies | No | The paper mentions using 'Solo-learn' and 'Lightly SSL' frameworks, and 'detectron2', but does not specify their version numbers or other crucial software dependencies with version details needed for exact replication. |
| Experiment Setup | Yes | For OR, γ of SRIP is tuned from {1e 3, 1e 4, 1e 5} and γ of SO is tuned from {1e 5, 1e 6, 1e 7} on a validation set. When training the linear classifier, we use 100 epochs, weight decay to 0.0005, learning rate 0.1 (we divide the learning rate by a factor of 10 on Epoch 60 and 100), batchsize 256, and SGD with Nesterov momentum as optimizer (In IMAGENET-1k, we use batchsize 128 and learning rate 0.2). |