reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Extreme Masking for Learning Instance and Distributed Visual Representations

Authors: Zhirong Wu, Zihang Lai, Xiao Sun, Stephen Lin

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the experiments, we systematically study the model behavior under different masking ratios, its convergence properties using multiple masks on larger datasets, and integration with various other data augmentations. Based on the study observations, we also propose a new augmentation scheme which uses shared image crops but different colors for the two input views. Our main results on Image Net1k outperform prior masked modeling approaches on both finetuning and linear probing metrics.
Researcher Affiliation	Collaboration	Zhirong Wu1 Zihang Lai2 Xiao Sun1 Stephen Lin1 Microsoft Research Asia1 Carnegie Mellon University2
Pseudocode	No	The paper describes the Extre MA approach and its components like Extreme Masking, Distributed and Instance Representations, and Learning Objective in prose. No structured pseudocode or algorithm blocks are provided.
Open Source Code	No	The paper does not explicitly state that source code for the methodology is provided or publicly available, nor does it include any links to a code repository.
Open Datasets	Yes	Our main results on Image Net1k outperform prior masked modeling approaches on both finetuning and linear probing metrics... We therefore study multi-masking on Image Net22k... We evaluate semantic segmentation performance on the ADE20K (Zhou et al., 2017) dataset... We evaluate the transfer performance on the MSCOCO dataset.
Dataset Splits	Yes	We pretrain the representation on Image Net and evaluate it on finetuning (ft) and linear probe (lin) in our ablations. We finetune the model on top of the distributed representation, and conduct linear probes with the instance representation. The evaluation protocol mainly follows BEi T and MAE... Given the pretrained model, we use a small fraction of the Image Net1k training labels (1% or 10%) for semi-supervised finetuning.
Hardware Specification	Yes	Notably, this is achieved by training Extre MA using a single node of 8 V100 GPUs in about two days for a Vi T-Base model.
Software Dependencies	No	The paper mentions using the Adam W optimizer, but does not provide specific version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or programming languages.
Experiment Setup	Yes	We use the original Vi T-base (Dosovitskiy et al., 2021) as the backbone architecture without the layer scale technique (Touvron et al., 2021b). The class attention follows the original design in (Touvron et al., 2021b) with a default of two transformer blocks and a layer scale hyper-parameter of 0.1. We train our model using the Adam W optimizer (Loshchilov & Hutter, 2018) with a batch size of 2048, an initial base learning rate of 1.5e-4, and a weight decay of 0.1. The exponential averaging weight for the momentum encoder is initialized to 0.996 and increased to 1.0 following a cosine schedule. The default augmentation is random resized cropping and random flipping. All models are trained for 300 epochs. Further details are provided in Table 13 and Table 14.