Dual PatchNorm

Authors: Manoj Kumar, Mostafa Dehghani, Neil Houlsby

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on image classification, contrastive learning, semantic segmentation and transfer on downstream classification datasets, incorporating this trivial modification, often leads to improved accuracy over well-tuned vanilla Vision Transformers and never hurts.
Researcher Affiliation Industry Manoj Kumar EMAIL Mostafa Dehghani EMAIL Neil Houlsby EMAIL Google Research, Brain Team
Pseudocode Yes 1 hp , wp = patch_size [0], patch_size [1] 2 x = einops.rearrange( 3 x, "b (ht hp) (wt wp) c -> b (ht wt) (hp wp c)", hp=hp , wp=wp) 4 x = nn.Layer Norm(name="ln0")(x) 5 x = nn.Dense(output_features , name="dense")(x) 6 x = nn.Layer Norm(name="ln1")(x)
Open Source Code No The paper mentions using external libraries like big-vision, Scenic, and einops, and provides a small code snippet in the introduction. However, it does not explicitly state that the authors' implementation code for the Dual Patch Norm methodology described in this paper is publicly available, nor does it provide a direct link to such a repository.
Open Datasets Yes We train Vi T architectures (with and without DPN) in a supervised fashion on 3 different datasets with varying number of examples: Image Net-1k (1M), Image Net-21k (21M) and JFT (4B) (Zhai et al., 2022a). ... We finetune Image Net-pretrained B/16 and B/32 with and without DPN on the Visual Task Adaption benchmark (VTAB) (Zhai et al., 2019). ... We finetune Image Net-pretrained B/16 with and without DPN on the ADE-20K 512 512 (Zhou et al., 2019) semantic segmentation task.
Dataset Splits Yes We split the Image Net train set into a train and validation split, and use the validation split to arrive at the final DPN recipe. ... We use the VTAB training protocol which defines a standard train split of 800 examples and a validation split of 200 examples per dataset.
Hardware Specification No The paper does not provide specific details about the hardware used, such as GPU models, CPU types, or other accelerator specifications.
Software Dependencies No The paper mentions using 'big-vision (Beyer et al., 2022c) and Scenic (Dehghani et al., 2022) library' and 'einops (Rogozhnikov, 2022) library', and refers to 'jax library'. However, it does not provide specific version numbers for these software components, which is required for a reproducible description of ancillary software.
Experiment Setup Yes We train 5 architectures: Ti/16, S/16, S/32, B/16 and B/32 using the Aug Reg (Steiner et al., 2022) recipe for 93000 steps with a batch size of 4096... Our full set of hyperparameters are available in Appendix C and Appendix D. ... config.input.batch_size = 4096 ... config.total_epochs = 300 ... config.lr = 0.001 ... config.wd = 0.0001 ... config.schedule = dict( warmup_steps =10 _000 , decay_type ='cosine ')... config.optax_name = 'scale_by_adam ' ... config.grad_clip_norm = 1.0.