DECO: Unleashing the Potential of ConvNets for Query-based Detection and Segmentation
Authors: Xinghao Chen, Siwei Li, Yijing Yang, Yunhe Wang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare the proposed DECO against prior detectors on the challenging COCO benchmark. Despite its simplicity, our DECO achieves competitive performance in terms of detection accuracy and running speed. Specifically, with the Res Net-18 and Res Net-50 backbone, our DECO achieves 40.5% and 47.8% AP with 66 and 34 FPS, respectively. The proposed method is also evaluated on the segment anything task, demonstrating similar performance and higher efficiency. |
| Researcher Affiliation | Collaboration | Xinghao Chen1 , Siwei Li1,2 , Yijing Yang1, Yunhe Wang1 1 Huawei Noah s Ark Lab 2 Tsinghua University |
| Pseudocode | No | The paper describes the proposed Inter Conv mechanism and DECO architecture through detailed text descriptions and architectural diagrams (Figure 2 and Figure 3), but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Codes are available at https://github.com/xinghaochen/ DECO and https://github.com/mindspore-lab/models/tree/ master/research/huawei-noah/DECO. |
| Open Datasets | Yes | We evaluate the proposed DECO on the challenging object detection benchmark, i.e., COCO (Lin et al., 2014). |
| Dataset Splits | No | The paper mentions using the COCO benchmark and following similar training settings as DETR, and details image resizing for augmentation, but does not explicitly state the training, validation, or test splits for the dataset (e.g., percentages or sample counts for each split). |
| Hardware Specification | Yes | The latency is measured on a NVIDIA V100 GPU. ... The FPS we report is the average number of the first 100 images in the COCO 2017 val set on a NVIDIA V100 GPU. |
| Software Dependencies | No | The paper mentions using 'Adam W optimizer' but does not specify its version or any other software dependencies with version numbers. |
| Experiment Setup | Yes | For the vanilla DECO, we follow similar training settings as DETR (Carion et al., 2020). We train the proposed DECO models for 150 epochs using Adam W optimizer, with weight decay of 10 4 and initial learning rates as 10 4 and 10 5 for the encoder-decoder and backbone, respectively. The learning rate is dropped by a factor of 10 after 100 epochs. The augmentation scheme is the same as DETR, which includes random horizontal flipping, random crop augmentation, and scale augmentation. The input image shorter side is resized to a random size between 480 and 800 pixels in the scale augmentation while restricting the longer size to at most 1333. As to DECO+ that equipped with multi-scale feature fusion, the training image size is selected between 480 and 800 with 32 stride following the RT-DETR baseline. The inference size is set to 640 640. |