PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling
Authors: Yuan Liu, Songyang Zhang, Jiacheng Chen, Kai Chen, Dahua Lin
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We thoroughly evaluate it with three well-established approaches, MAE (He et al., 2022), Conv MAE (Gao et al., 2022), MFF(Liu et al., 2023), and LSMAE (Hu et al., 2022). The experimental results demonstrate that Pix MIM consistently enhances the performance of the baselines across various evaluation protocols, including the linear probing and fine-tuning on Image Net-1K (Deng et al., 2009), the semantic segmentation on ADE20K (Zhou et al., 2018), and the object detection on COCO (Lin et al., 2014). |
| Researcher Affiliation | Academia | Yuan Liu EMAIL Shanghai AI Laboratory Songyang Zhang EMAIL Shanghai AI Laboratory Jiacheng Chen EMAIL Simon Fraser University Kai Chen EMAIL Shanghai AI Laboratory Dahua Lin EMAIL Shanghai AI Laboratory The Chinese University of Hong Kong |
| Pseudocode | No | The paper describes the methodology in Section 4.1 'Low-frequency Target Generation' using text and mathematical formulas, but it does not include any clearly labeled pseudocode blocks or algorithms. |
| Open Source Code | No | Code and models will be available. |
| Open Datasets | Yes | The experimental results demonstrate that Pix MIM consistently enhances the performance of the baselines across various evaluation protocols, including the linear probing and fine-tuning on Image Net-1K (Deng et al., 2009), the semantic segmentation on ADE20K (Zhou et al., 2018), and the object detection on COCO (Lin et al., 2014). |
| Dataset Splits | Yes | Image Net-1K consists of 1.3M images of 1k categories and is split into the training and validation sets. When applying our methods to MAE (He et al., 2022), MFF (Liu et al., 2023), Conv MAE (Gao et al., 2022), and LSMAE (Hu et al., 2022), we strictly follow their original pre-training and evaluation settings on Image Net-1K to guarantee the fairness of experiments, including the pre-training schedule, network architecture, learning rate setup, and fine-tuning protocols, etc. |
| Hardware Specification | No | The paper mentions 'limited computational resources' but does not specify any exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications used for running experiments. |
| Software Dependencies | No | The paper mentions software components like 'optimizer Adam W (Loshchilov & Hutter, 2019)', 'Pytorch (Paszke et al., 2019)', 'Rand Aug (Cubuk et al., 2020)', 'mixup (Zhang et al., 2018)', 'cutmix (Yun et al., 2019)', and 'drop path (Huang et al., 2016)'. However, it does not specify explicit version numbers for these software libraries or frameworks, which is necessary for reproducible dependency listing. |
| Experiment Setup | Yes | In Section 5.1 Experiment Settings and Appendix B Full Implementation Details, the paper provides extensive and specific experimental setup details. For instance, Table 7 and 9 specify: 'optimizer Adam W', 'base learning rate 1.5e-4', 'weight decay 0.05', 'optimizer momentum β1, β2=0.9, 0.95', 'batch size 4096', 'learning rate schedule cosine decay', 'warmup epochs 40', 'training epochs 100', 'augmentation Rand Aug (9, 0.5)', 'label smoothing 0.1', 'mixup 0.8', 'cutmix 1.0', and 'drop path 0.1', along with similar details for other training configurations and tasks. |