reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Collapse-Proof Non-Contrastive Self-Supervised Learning

Authors: Emanuele Sansone, Tim Lebailly, Tinne Tuytelaars

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our theoretical findings on image datasets, including SVHN, CIFAR-10, CIFAR-100, and Image Net-100. Our approach effectively combines the strengths of feature decorrelation and cluster-based self-supervised learning methods, overcoming training failure modes while achieving strong generalization in clustering and linear classification tasks. The experimental analysis is divided into four main parts. Firstly, we compare CPLearn against noncontrastive approaches from the families of feature decorrelation and cluster-based methods on three image datasets, i.e. SVHN (Netzer et al., 2011), CIFAR-10, CIFAR100 (Krizhevsky et al., 2009).
Researcher Affiliation	Academia	1Department of Electrical Engineering (ESAT), KU Leuven, Belgium 2CSAIL, MIT, US.
Pseudocode	Yes	Figure 2: In CPLearn, minimizing the proposed objective together with the corresponding projector ensures that the embedding representations are clustered and at the same time that their features are decorrelated. This guarantees that the representations are collapse-proof, meaning that dimensional, cluster, intra-cluster and representation collapses are prevented. Algorithm 1 Pseudocode for CPLearn
Open Source Code	No	We use the repository from (da Costa et al., 2022) for SVHN and CIFAR experiments, and the one from (Caron et al., 2021) for Image Net-100 experiments. This refers to external codebases utilized, not the authors' own code release for the presented methodology.
Open Datasets	Yes	We validate our theoretical findings on image datasets, including SVHN, CIFAR-10, CIFAR-100, and Image Net-100. We use a Res Net-8 backbone network with f = 128 for SVHN and CIFAR10, and with f = 256 for CIFAR-100, following the methodology from (Sansone, 2023). For Image Net-100, we use a standard small Vi T with f = 384, following the methodology from (Caron et al., 2021).
Dataset Splits	Yes	The experimental analysis is divided into four main parts. Firstly, we compare CPLearn against noncontrastive approaches from the families of feature decorrelation and cluster-based methods on three image datasets, i.e. SVHN (Netzer et al., 2011), CIFAR-10, CIFAR100 (Krizhevsky et al., 2009). Figure 4: Realization of embedding covariance (left) and adjacency matrices (right) for the whole CIFAR-10 test dataset. Evaluation. For linear probe evaluation, we followed standard practice by removing the projector head and train a linear classifier on the backbone representation.
Hardware Specification	Yes	We used a Vi T-small backbone network and train it for 100 epochs with learning rate equal to 5e 4 and batch-size per GPU equal to 64 on a node with 8 NVIDIA A100 GPUs.
Software Dependencies	No	The paper mentions software by name (e.g., "Py Torch-like pseudo-code", "Adam optimizer", "solo-learn", "DINO codebase") but does not provide specific version numbers for these components.
Experiment Setup	Yes	We use a Res Net-8 backbone network with f = 128 for SVHN and CIFAR-10, and with f = 256 for CIFAR-100... The β parameter in Eq. 7 is chosen from the range {0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10}... We used a Res Net-18 backbone network on CIFAR-10 and train it for 1000 epochs with Adam optimizer, learning rate equal to 1e 3 and batch-size equal to 64 on 1 A100 GPU. (Also refers to Table 6 in Appendix I, which lists more hyperparameters like batch size, epochs, Adam betas, learning rate, and data augmentation parameters).