reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Unified View of Masked Image Modeling

Authors: Zhiliang Peng, Li Dong, Hangbo Bao, Furu Wei, Qixiang Ye

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on image classification and semantic segmentation show that Mask Distill achieves comparable or superior performance than state-of-the-art methods. We conduct extensive experiments on downstream tasks including Image Net fine-tuning and semantic segmentation. Experimental results show that the proposed approach improves performance across various settings.
Researcher Affiliation	Collaboration	Zhiliang Peng EMAIL University of Chinese Academy of Sciences Li Dong EMAIL Microsoft Research Hangbo Bao EMAIL Microsoft Research Furu Wei EMAIL Microsoft Research Qixiang Ye EMAIL University of Chinese Academy of Sciences
Pseudocode	No	The paper describes the Mask Distill method using mathematical equations and textual descriptions, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code and pretrained models will be available at https://aka.ms/unimim.
Open Datasets	Yes	We conduct MIM pretraining on Image Net-1k (Russakovsky et al., 2015) for base-, largeand huge-size Vi Ts. After that, we evaluate pretraining models on downstream visual tasks, image classification on Image Net-1k, and semantic segmentation on ADE20k (Zhou et al., 2019).
Dataset Splits	Yes	We consider the popular evaluating protocol for image classification on Image Net-1k dataset: fine-tuning top-1 accuracy. As for the semantic segmentation task, we evaluate the m Io U metric on ADE20K dataset (Zhou et al., 2019) with Uper Net (Xiao et al., 2018) framework. We use the entire validation set for evaluation.
Hardware Specification	No	The paper mentions 'GPU memory (G)' and 'batchsize 64 on each GPU' in Table 7, indicating the use of GPUs, but does not provide specific details on the GPU models, CPU, or other hardware specifications used for experiments.
Software Dependencies	No	The paper mentions software components like 'Adam W optimizer' and refers to various models/frameworks (e.g., 'Vi T', 'Swin Transformers'), but it does not provide specific version numbers for any key software dependencies or libraries used for implementation.
Experiment Setup	Yes	For the pretraining setting, we mainly follow BEi T (Bao et al., 2022; Peng et al., 2022): batch size 2048, learning rate 1.5e-3, Adam W optimizer with weight decay 0.05, drop path 0.1 (0.2) for Vi T-Base(large), block-wise mask 40%, epochs 300/800. More details can be found in Appendix.