DetNAS: Backbone Search for Object Detection

Authors: Yukang Chen, Tong Yang, Xiangyu Zhang, GAOFENG MENG, Xinyu Xiao, Jian Sun

NeurIPS 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we show the effectiveness of Det NAS on various detectors, for instance, one-stage Retina Net and the two-stage FPN. We empirically find that networks searched on object detection shows consistent superiority compared to those searched on Image Net classification. The resulting architecture achieves superior performance than hand-crafted networks on COCO with much less FLOPs complexity.
Researcher Affiliation Collaboration 1National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 2Megvii Technology EMAIL EMAIL
Pseudocode Yes We formulate the supernet training process in Algorithm 1 in the supplementary material. We formulate this process as Algorithm 2 in the supplementary material.
Open Source Code Yes Code and models have been made available at: https://github.com/megvii-model/Det NAS.
Open Datasets Yes For Image Net classification dataset, we use the commonly used 1.28M training images for supernet pre-training. We train on 8 GPUs with a total of 16 images per minibatch for 90k iterations on COCO and 22.5k iterations on VOC.
Dataset Splits Yes We split the detection datasets into a training set for supernet fine-tuning, a validation set for architecture search, and a test set for final evaluation. For VOC, the validation set contains 5k images randomly selected from trainval2007 + trainval2012 and the remains for supernet fine-tuning. For COCO, the validation set contains 5k images randomly selected from trainval35k [13] and the remains for supernet fine-tuning.
Hardware Specification Yes For the small search space, GPUs are GTX 1080Ti . For the large search space, GPUs are Tesla V100.
Software Dependencies No The paper mentions 'Detectron [6]' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes For Image Net classification dataset, we use the commonly used 1.28M training images for supernet pre-training. To train the one-shot supernet backbone on Image Net, we use a batch size of 1024 on 8 GPUs for 300k iterations. We set the initial learning rate to be 0.5 and decrease it linearly to 0. The momentum is 0.9 and weight decay is 4 10 5. We train on 8 GPUs with a total of 16 images per minibatch for 90k iterations on COCO and 22.5k iterations on VOC. The initial learning rate is 0.02 which is divided by 10 at {60k, 80k} iterations on COCO and {15k, 20k} iterations on VOC. We use weight decay of 1 10 4 and momentum of 0.9.