Clustering Properties of Self-Supervised Learning
Authors: Xi Weng, Jianing An, Xudong Ma, Binhang Qi, Jie Luo, Xi Yang, Jin Song Dong, Lei Huang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on standard SSL benchmarks reveal that models pretrained with Re SA outperform other state-of-the-art SSL methods by a significant margin. Finally, we analyze how Re SA facilitates better clustering properties, demonstrating that it effectively enhances clustering performance at both fine-grained and coarse-grained levels, shaping representations that are inherently more structured and semantically meaningful. |
| Researcher Affiliation | Academia | 1SKLCCSE, School of Artificial Intelligence, Beihang University 2School of Computing, National University of Singapore 3Beijing Academy of Artificial Intelligence 4Hangzhou International Innovation Institute, Beihang University. Correspondence to: Lei Huang <EMAIL>. |
| Pseudocode | Yes | For clarity, we first provide the algorithm of Re SA in Py Torch-style pseudo-code: |
| Open Source Code | Yes | Our code is available at https://github.com/winci-ai/resa |
| Open Datasets | Yes | We perform pretraining from scratch on a variety of datasets, including CIFAR-10/100, Image Net-100, and Image Net (Deng et al., 2009), utilizing diverse encoders such as Conv Nets and the Vi T. Furthermore, we compare the performance of Re SA with state-of-the-art SSL methods across a range of downstream tasks, e.g. linear probe evaluation and transfer learning. The full Py Torch-style algorithm as well as details of implementation is provided in Appendix B. |
| Dataset Splits | Yes | When evaluating on CIFAR-10/100, we adopt the same linear evaluation protocol as in W-MSE (Ermolov et al., 2021) and INTL (Weng et al., 2024): training a linear classifier for 500 epochs on each labeled dataset using the Adam optimizer, without data augmentation. We further evaluate the low-shot learning capability of Re SA in semi-supervised classification. Specifically, we fine-tune the pre-trained Re SA encoder and train a linear classifier for 20 epochs, using 1% and 10% subsets of Image Net, following the same splits as Sim CLR (Chen et al., 2020b). |
| Hardware Specification | Yes | The table is mostly inherited from solo-learn (da Costa et al., 2022). All methods are based on Res Net-18 with two augmented views generated from per sample and are trained for 1000-epoch on CIFAR-10/100 with a batch size of 256 and 400-epoch on Image Net-100 with a batch size of 128. The bold values indicate the best performance, and the underlined values represent the second highest accuracy. Table 4. Comparison of Computational overhead among various SSL methods. For fairness, we set the batch size to 1024 with two 224x2 augmented views pretraining on Image Net, and perform all measurements including peak memory (GB per GPU) and training time (hours per epoch) on the same environment and machine equipped with 8 A100-PCIE-40GB GPUs using 32 dataloading workers under mixed-precision. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers for reproducibility. It mentions using PyTorch-style pseudocode and optimizers like SGD and Adam, but no versions. |
| Experiment Setup | Yes | Universal settings. In all experiments conducted in Section 5, we adopt a momentum network, consistent with the practices of most existing self-supervised learning (SSL) methods (Grill et al., 2020; Chen et al., 2021; Caron et al., 2021; Liu et al., 2022; Weng et al., 2024). While the momentum network is not necessary to prevent collapse in Re SA, it has been shown to effectively promote long-term learning in SSL models (He et al., 2020; Chen & He, 2021). The momentum coefficient, temperature, and Sinkhorn-Knopp parameters in Re SA are configured in accordance with the pseudo-code provided earlier, without requiring further tuning. Furthermore, a standard three-layer MLP is employed as the projector, featuring a hidden layer dimension of 2048 and an output embedding dimension of 512. Table 8. Optimizer-related parameters in Re SA pretraining. Method dataset encoder predictor optimizer batch size base lr weight decay warmup CIFAR-10 Res Net-18 SGD 256 0.3 10-4 2 epochs CIFAR-100 Res Net-18 SGD 256 0.3 10-4 2 epochs Image Net-100 Res Net-18 SGD 128 0.5 2.5e-5 2 epochs Image Net Res Net-50 SGD 256 0.5 10-5 2 epochs 1024 0.5 10-5 10 epochs Vi T-S/16 Adam W 1024 5e-4 0.1 40 epochs |