reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Theoretical Analysis of Self-Supervised Learning for Vision Transformers

Authors: Yu Huang, Zixin Wen, Yuejie Chi, Yingbin Liang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The main body of the paper presents only theoretical results, with all proofs provided in the appendices. Additionally, the appendices include proof sketches that offer intuitive explanations of the proof steps. The appendix also contains experimental results, with detailed descriptions of the experimental settings to facilitate result reproduction.
Researcher Affiliation	Academia	University of Pennsylvania Carnegie Mellon University The Ohio State University
Pseudocode	No	The paper focuses on theoretical analysis and proof techniques, describing gradient descent dynamics and attention correlations, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The reproducibility statement mentions "detailed descriptions of the experimental settings to facilitate result reproduction" but does not include any explicit statement of code release, a link to a repository, or mention of code in supplementary materials for the methodology described in the paper.
Open Datasets	Yes	Setup. In this work, we compare the performance of Vi T-B/16 encoder pre-trained on Image Net1K (Russakovsky et al., 2015) among the following four models: masked reconstruction model (MAE), contrastive learning model (Mo Co v3 (Chen et al., 2021b)), other self-supervised model (DINO Caron et al. (2021)), and supervised model (Dei T Touvron et al. (2021)).
Dataset Splits	No	The paper mentions using ViT-B/16 encoder pre-trained on ImageNet1K and analyzing attention focus across 152 example images, but it does not specify any training/test/validation splits for these images or how they were selected from the dataset.
Hardware Specification	No	The paper, including its experimental section and reproducibility statement, does not provide any specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies	No	The paper, including its experimental section and reproducibility statement, does not list any specific software dependencies with version numbers.
Experiment Setup	No	The paper describes the models used (MAE, Mo Co, DINO, DeiT) and the focus of the analysis (12 different attention heads in the last layer of ViT-B on 152 example images), but it lacks specific experimental setup details such as hyperparameters (e.g., learning rates, batch sizes) for their analysis or model training.