reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods

Authors: Yuchen Lu, Zhen Liu, Aristide Baratin, Romain Laroche, Aaron Courville, Alessandro Sordoni

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through a large-scale empirical study with a diverse family of SSL algorithms, we ﬁnd that CLID better correlates with in-distribution model performance than other competing recent evaluation schemes. We also benchmark CLID on out-of-domain generalization, where CLID serves as a predictor of the transfer performance of SSL models on several visual classiﬁcation tasks, yielding improvements with respect to the competing baselines.
Researcher Affiliation	Collaboration	Yuchen Lu Mila, University of Montreal Zhen Liu Mila, University of Montreal Aristide Baratin SAIT AI Lab, Montreal Romain Laroche Microsoft Research Aaron Courville Mila, University of Montreal, CIFAR Alessandro Sordoni Microsoft Research, MILA
Pseudocode	No	The paper describes methods like Intrinsic Dimension (ID) and Cluster Learnability (CL) using mathematical formulations and descriptive text, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper references official implementations for baseline methods (Wang & Isola (2020) and MCR2 (Yu et al., 2020)) with URLs, but it does not provide explicit source code for the methodology (CLID) described in this paper. There is no statement of release for their own code.
Open Datasets	Yes	We select in total 28 self-supervised learning checkpoints trained on Image Net over diﬀerent algorithms, architecture, and training epochs. A complete list can be found in Table 4 in the appendix. [...] We collect 7 out-of-domain downstream visual classiﬁcation tasks.
Dataset Splits	Yes	We use the KNN evaluation on the validation data using the ground-truth labels to measure the performance of the model, which has been shown to be well correlated with the linear evaluation but computationally less expensive (Caron et al., 2021). [...] We re-use the dataset split to assess the performance of a KNN classiﬁer on this labelled dataset.
Hardware Specification	Yes	All our experiments are computed on a single V100 GPU.
Software Dependencies	No	The paper mentions using K-means clustering, KNN classifier, and MINE (Mutual Information Neural Estimation), but it does not specify any version numbers for these software components or other libraries/frameworks used.
Experiment Setup	Yes	For the computation of cluster learnability, we choose the square root of the dataset size as the number of clusters in Kmeans. We report results with 1 neighbor for our KNN learner. We normalize the features and use cosine distance for the K-means clustering and KNN learner5. [...] We follow the oﬃcial implementation6 with = 2 and t = 2 as default values for the tunable parameters in Eqn 1. [...] We use a batch size 128, learning rate 0.0005 and weight decay 0.001. The network is trained for 50000 steps on the training images, and we report MINE on the validation data.