A Unified View of Masked Image Modeling
Authors: Zhiliang Peng, Li Dong, Hangbo Bao, Furu Wei, Qixiang Ye
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on image classification and semantic segmentation show that Mask Distill achieves comparable or superior performance than state-of-the-art methods. We conduct extensive experiments on downstream tasks including Image Net fine-tuning and semantic segmentation. Experimental results show that the proposed approach improves performance across various settings. |
| Researcher Affiliation | Collaboration | Zhiliang Peng EMAIL University of Chinese Academy of Sciences Li Dong EMAIL Microsoft Research Hangbo Bao EMAIL Microsoft Research Furu Wei EMAIL Microsoft Research Qixiang Ye EMAIL University of Chinese Academy of Sciences |
| Pseudocode | No | The paper describes the Mask Distill method using mathematical equations and textual descriptions, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and pretrained models will be available at https://aka.ms/unimim. |
| Open Datasets | Yes | We conduct MIM pretraining on Image Net-1k (Russakovsky et al., 2015) for base-, largeand huge-size Vi Ts. After that, we evaluate pretraining models on downstream visual tasks, image classification on Image Net-1k, and semantic segmentation on ADE20k (Zhou et al., 2019). |
| Dataset Splits | Yes | We consider the popular evaluating protocol for image classification on Image Net-1k dataset: fine-tuning top-1 accuracy. As for the semantic segmentation task, we evaluate the m Io U metric on ADE20K dataset (Zhou et al., 2019) with Uper Net (Xiao et al., 2018) framework. We use the entire validation set for evaluation. |
| Hardware Specification | No | The paper mentions 'GPU memory (G)' and 'batchsize 64 on each GPU' in Table 7, indicating the use of GPUs, but does not provide specific details on the GPU models, CPU, or other hardware specifications used for experiments. |
| Software Dependencies | No | The paper mentions software components like 'Adam W optimizer' and refers to various models/frameworks (e.g., 'Vi T', 'Swin Transformers'), but it does not provide specific version numbers for any key software dependencies or libraries used for implementation. |
| Experiment Setup | Yes | For the pretraining setting, we mainly follow BEi T (Bao et al., 2022; Peng et al., 2022): batch size 2048, learning rate 1.5e-3, Adam W optimizer with weight decay 0.05, drop path 0.1 (0.2) for Vi T-Base(large), block-wise mask 40%, epochs 300/800. More details can be found in Appendix. |