reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Separable Self-attention for Mobile Vision Transformers

Authors: Sachin Mehta, Mohammad Rastegari

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on standard vision datasets and tasks demonstrates the effectiveness of the proposed method (Fig. 2).
Researcher Affiliation	Industry	Sachin Mehta Apple Inc. Mohammad Rastegari Apple Inc.
Pseudocode	No	The paper describes mathematical operations (e.g., Eq. 1 and 2) and architectural diagrams (Fig. 3, 4, 6) but does not contain explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code	Yes	Our source code is available at: https://github.com/apple/ml-cvnets.
Open Datasets	Yes	We train Mobile Vi Tv2 for 300 epochs... on the Image Net-1k dataset (Russakovsky et al., 2015)... study its performance on MS-COCO dataset (Lin et al., 2014)... two standard semantic segmentation datasets, ADE20k (Zhou et al., 2017) and PASCAL VOC 2012 (Everingham et al., 2015).
Dataset Splits	Yes	We train Mobile Vi Tv2 for 300 epochs with an effective batch size of 1024 images... on the Image Net-1k dataset (Russakovsky et al., 2015) with 1.28 million and 50 thousand training and validation images respectively. ...split it into about 11 million and 522 thousand training and validation images spanning over 10,450 classes, respectively. ...Table 11: Configuration for finetuning Mobile Vi Tv2 on downstream tasks. Dataset MS-COCO ... # Training samples 117 k # Validation samples 5 k
Hardware Specification	Yes	These results are computed on a single CPU core machine with a 2.4 GHz 8-Core Intel Core i9 processor... Here, inference time is measured on an i Phone12... throughput is measured on NVIDIA V100 GPUs...
Software Dependencies	No	The paper mentions PyTorch and CVNets but does not specify their version numbers or other software dependencies with version details.
Experiment Setup	Yes	We train Mobile Vi Tv2 for 300 epochs with an effective batch size of 1024 images (128 images per GPU 8 GPUs) using Adam W of Loshchilov & Hutter (2019) on the Image Net-1k dataset... We linearly increase the learning rate from 10 6 to 0.002 for the first 20k iterations. After that, the learning rate is decayed using a cosine annealing policy (Loshchilov & Hutter, 2017). Tables 9, 10, and 11 provide extensive details on training configurations for various tasks and datasets.