reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Mask Invariant Mutual Information for Masked Image Modeling

Authors: Tao Huang, Yanxiang Ma, Shan You, Chang Xu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on standard benchmarks show that MI-MAE significantly outperforms MAE models in tasks such as image classification, object detection, and semantic segmentation. Our findings validate the theoretical framework and highlight the practical advantages of applying the information bottleneck principle to MAEs, offering deeper insights for developing more powerful self-supervised learning models.
Researcher Affiliation	Collaboration	1School of Computer Science, Faculty of Engineering, The University of Sydney 2Sense Time Research
Pseudocode	Yes	Algorithm 1 Self-supervised pre-training with MI-MAE. Our changes to MAE are marked with *. Input: Encoder E, decoder D, variational distribution approximation network V with parameters θ, training dataset Dtr, number of masks per image N.
Open Source Code	No	The paper does not provide an explicit statement about the release of source code or a link to a code repository.
Open Datasets	Yes	Image classification. Our method is developed based on the official code of MAE (He et al., 2022). We strictly adhere to the original pre-training and fine-tuning settings on Image Net-1K (Russakovsky et al., 2015). Object detection. We transfer the pre-trained Vi T models to COCO (Lin et al., 2014) dataset. Semantic segmentation. We conduct semantic segmentation experiments on the ADE20K (Zhou et al., 2017) dataset, using the same settings as in MAE (He et al., 2022).
Dataset Splits	Yes	We strictly adhere to the original pre-training and fine-tuning settings on Image Net-1K (Russakovsky et al., 2015). We transfer the pre-trained Vi T models to COCO (Lin et al., 2014) dataset. We adopt Mask R-CNN framework (He et al., 2017), which predicts detections and instance segmentations simultaneously. We follow the model setup and training strategy used in Vi TDet (Li et al., 2022b).
Hardware Specification	Yes	All our experiments use NVIDIA V100 GPUs.
Software Dependencies	No	The paper mentions several frameworks and optimizers such as AdamW, Mask R-CNN, and UperNet, but does not specify their version numbers or other software dependencies with version information.
Experiment Setup	Yes	Pre-training. ... We pre-train the models using an Adam W optimizer (Loshchilov & Hutter, 2019) with β1 = 0.9, β2 = 0.95, and a weight decay of 0.05. The total batch size is 1024 ... We use a cosine decay learning rate schedule with a 10-epoch warmup and a base learning rate of 1.5 10 4. For the hyper-parameters introduced by our MI-MAE, we set λ1 = λ2 = 1 and λ3 = 10.