reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning

Authors: Amin Karimi Monsefi, Mengxi Zhou, Nastaran Monsefi, Ser-Nam Lim, Wei-Lun Chao, Rajiv Ramnath

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results demonstrate the effectiveness of FOLK in achieving competitive performance to many state-of-the-art SSL methods across various downstream tasks, including image classification, few-shot learning, and semantic segmentation. (Abstract) ... Through extensive experimentation, we demonstrate the efficacy of FOLK. Our findings indicate that FOLK performs on par or better than many state-of-the-art MIM and MFM techniques in various downstream tasks, including image classification, few-shot learning, and semantic segmentation. (Section 1 - Contributions) ... In this section, we detail the experimental setup and evaluate our proposed FOLK framework on classification tasks using both full fine-tuning and few-shot learning approaches. Additional experiments, including semantic segmentation as a downstream task and ablation studies, are provided in Appendix B for further insights and comprehensive results. (Section 4 - Experiments)
Researcher Affiliation	Academia	Amin Karimi Monsefi , Mengxi Zhou , Nastaran Karimi Monsefi Ser-Nam Lim , Wei-Lun Chao , Rajiv Ramnath EMAIL, EMAIL The Ohio State University, Hamedan University of Technology, University of Central Florida
Pseudocode	Yes	The generation of Com and RCom filters is illustrated in Fig. 2, with the pseudocode available in Appendix A.4. (Section 3.2.1) ... Algorithm 1 presents the pseudocode for our proposed Com and RCom filters (denoted as M in Eq. 2 and Eq. 3). (Appendix A.4)
Open Source Code	Yes	https://github.com/amin K8/FOLK (Abstract)
Open Datasets	Yes	We adopt the Image Net-1K training dataset (Deng et al., 2009) without labels for pre-training our self-supervised learning. (Section 4) ... For image classification, we continue to leverage the Image Net-1K dataset (Deng et al., 2009) to assess the generalizability and effectiveness of the learned features. In contrast, for semantic segmentation, we utilize the ADE20K dataset (Zhou et al., 2017), a standard benchmark in scene parsing and segmentation tasks. (Section 4) ... Threshold(s) CIFAR-10 CIFAR-100 Image Net-1k (Table 8, Appendix B.5.1)
Dataset Splits	Yes	In this experiment, we aim to highlight FOLK s superior adaptability and efficiency by fine-tuning pre-trained models using only 10% of the Image Net-1K dataset over 200 epochs. (Section 4.2.2) ... Table 7 presents an extended evaluation of few-shot learning performance using a smaller set of labeled data. Various pre-trained models were fine-tuned using only 1% of the Image Net-1K dataset over 1000 epochs. (Appendix B.4) ... We ran 200 epochs for fine-tuning the pre-trained model (i.e. Vi T-S/16 or Vi T-B/16) on Image Net1K for image classification... (Appendix A.2.1) ... The full fine-tuning Vi T-S/16 model for semantic segmentation task with ADE20K dataset. (Table 3, Appendix B.1)
Hardware Specification	Yes	Our computational infrastructure supports these extensive experiments, consisting of four nodes, each of which has four NVIDIA A100 80GB GPUs, in total 16 GPUs. (Section 4)
Software Dependencies	No	We used the Py Torch Library (Paszke et al., 2019) for our code development. (Appendix A.1) The paper mentions the PyTorch library but does not specify its version number or any other software dependencies with version numbers.
Experiment Setup	Yes	We employ the Adam W optimizer (Loshchilov & Hutter, 2019), with a pre-training duration set to 300 or 800 epochs, a batch size of 2048, 128 per GPU, and a peak learning rate of 1.2 10 3. Additional parameters include a cosine decay learning rate schedule, 20 warmup epochs, and a specific setting for optimizer momentum (β1, β2 = 0.9, 0.95) (Chen et al., 2020a) with a weight decay of 0.05. Also, we used a value of 3.0 for gradient clipping to prevent the exploding gradient problem. (Appendix A.1 - Pre-train Stage) ... We ran 200 epochs for fine-tuning the pre-trained model (i.e. Vi T-S/16 or Vi T-B/16) on Image Net1K for image classification, employing the Adam W optimizer across all configurations with a weight decay of 0.05 and the optimizer momentum β1, β2 = 0.9, 0.999. Moreover, the approach includes a cosine decay learning rate schedule (Li & Arora, 2020), with a layer-wise learning rate decay equal to 0.8 (Bao et al., 2021; Clark et al., 2020). We also utilized advanced augmentation techniques such as Mixup (Zhang ets al., 2018) and Cutmix (Yun et al., 2019), as well as label smoothing and random augmentation to further improve model robustness and generalization capability (Szegedy et al., 2016; Cubuk et al., 2020). The batch size is maintained at 2048, with a peak learning rate set at 8 10 3. (Appendix A.2.1 - Classification Task) ... Ltot = α Ldis + LMFM. (6) where a hyperparameter α controls the weights between two loss terms, which is set as 1 in our experiments, unless stated otherwise. (Section 3.2.3)